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STATISTICAL THEORY OF PROPHYLACTIC 
AND THERAPEUTIC TRIALS 
I. LIMITATIONS OF THE UNIQUE NULL HYPOTHESIS 


BY 
LANCELOT HOGBEN Ann RAYMOND WRIGHTON 


Department of Medical Statistics, University of Birmingham 


1. PRELIMINARY CONSIDERATIONS 
Until a comparatively recent date, available literature on the efficacy of therapeutic 
procedures exclusively recorded the outcome of a fo//ow-up at consultant level, based 
as such on case histories of patients willing to testify or report for examination. 
Commonly with no control group for comparison, commonly also with scant regard 
for biassed selection of cases treated, and rarely undertaken with precautions to 
validify the testimony of the patient or the clinical judgement of the author, a clinical 
trial so conceived violates any or all of three canons of scientific method, as is now 
becoming recognized widely in all branches of medicine, except perhaps psychiatry. 
A change of outlook is largely due to the impact of more exacting standards of 
evidence established at an earlier date in connexion with the assessment of prophylactic 
measures, partly as a consequence of public controversy over the merits of vaccination. 
In this context, the term prophylactic calls for no comment. We here employ the 

expression therapeutic measures in the widest sense, including administration of 
drugs or convalescent sera, operative and manipulative surgery, diathermy and 
radiation (deep X-ray, short wave, radium) treatments, occupational and physio- 
therapy (including remedial gymnastics, faradization, massage), rehabilitation 
techniques. Bradford Hill (1951), who has himself directed a series of therapeutic 
trials on the now familiar pattern expounded by Greenwood (1935), has lately set 
forth in clear and simple language some of the essential safeguards of a scientific 
assessment of remedial measures; and there is no need to recapitulate them in this 
context. Our aim in what follows is to examine statistical procedures invoked to 
validate results within a framework of the precautions to which he has drawn attention: 
but it will simplify our task if we first specify the desiderata. Prophylactic or thera- 
peutic trials alike pre-suppose: 

(i) a prescription of treatments: 

(ii) a yardstick of comparison, j.c. a control group untreated or treated by a method 

alternative to the test method; 

(iii) a criterion or criteria of efficacy : 

(iv) a method of scoring (iii): 

(v) an agreed basis for selection of test subjects; 

(vi) a statistical procedure for validification of (i/), (iv), and the conclusions suggested 

by the method of scoring employed. 
89 
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The treatment prescription is worthy of comment only in so far as it sets a limit 
to the legitimacy of any positive assertion we may make. Thus it may specify different 
drugs, different dosages of the same drug, or the same drug with different ancillaries 
(e.g. differential regulation of fluid intake). If we merely specify Treatment A and 
Treatment B in terms of two different drugs, the outcome of our trial can merely 
tell us, if anything, whether one drug is preferable to the other within the framework 
of the prescribed dosage. This is not a statistical issue, and need not detain us. 
With respect to the second, there is little to add to the canons which Bradford Hill 
has propounded. The third calls for comment because the choice of the criterion 
and the assessment of its value may raise issues of statistical as well as of clinical 
interest. Briefly, we may classify our criteria of efficacy as follows: 

(1) Attack rate (prophylaxis only) 
(2) Severity: 


(a) mortality 

(b) complications 

(c) sequelae 

(d) particular signs and/or symptoms 


(3) Laboratory (biochemical, pathological, radiological, haematological, etc.) or other 
objective (height, weight, body temperature) tests 
(4) Clinical judgement 
(5) Patient's subsequent testimony 
(6) Duration of treatment, signs or symptoms 
(7) Relapse rate 
Of the foregoing, (3) raises ad hoc issues of statistical analysis mentioned at a 
later stage. Both (4) and (5) will be more or less valuable or worthless in so far as the 
design of the trial takes within its scope provision for statistical assessment of their 
reliability. On this topic, there is again little to add to what Bradford Hill has written 
about it. It is pertinent to remark that: 
(a) duration of stay in hospital has special pitfalls, for reasons adequately set forth by 
Mackay (1951): 
(b) the reason for putting (7) last on our list is a reminder that recourse to only one 
criterion of efficacy may lead to a distorted view of the merits of a new treatment. 
The method of scoring is of more direct consequence for what follows. Among 
ways in which we may express the outcome of a trial in numerical terms amenable 
to statistical treatment, it is customary to distinguish two main types, subsumed by 
the terms statistics of attributes and statistics of measurement, or, more briefly, 
qualitative and quantitative statistics. Each form of words is misleading and we shall 
here adopt the following dichotomy: 
(i) Taxonomic scoring, when we specify a group numerically by the number(s) of 
individuals assignable to a class (or classes): 
(ji) Representative scoring, when we specify a group numerically by some average 
of a range of score values individually assignable to its members. 
Under (/) our criterion of c/ass membership may be qualitative or quantitative. 
Under (ii) the score assigned to each individual of the group may be a count or a 
measurement. Neither the dichotomy qualitative-quantitative nor the dichotomy 
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attribute-measurement therefore conveys clearly the essential distinction which the 
following example illustrates. If we score a group of patients by the number or 
proportion of individuals with an r-b-c of over 4.10° per c.mm., our method of 
scoring is taxonomic, but our class criterion is essentially quantitative. If we score 
the same group by the mean number of red cells per c.mm., our method of scoring 
will be representative, but the incorporated scores assigned to the individual will not 
be measurements sensu stricto. 

To a large extent the criterion of efficacy will determine the choice of the method 
of scoring. Thus (1), (2), (3), and (7) above necessarily invoke the taxonomic method; 
but either method is applicable to (6) and in some measure to (3), e.g. antitoxin titre 
in prophylactic trials. Where choice is open, as the foregoing remarks on the r-h-c 
illustrate, the issue is not primarily statistical; and it is not trivial to say so, since the 
statistician is all too prone to assume the contrary. In some situations, the representative 
method may have much to commend it in terms of available statistical techniques. 
The statistician, off his guard, may therefore be likely to prefer it as more sensitive in 
his own jargon; but the enumeration of individuals will be commonly more con- 
sistent with the humane end in view than will a method of scoring which records the 
average of the treatment group as a whole. 

As stated, a satisfactory basis for selection of personnel raises issues with the 
claims of which the consultant clinician, accustomed to contact with a highly selected 
population of sick persons and with few opportunities for examining a control group, 
has only lately come to terms. For the most part, these are commonplace to the 
laboratory worker trained in the discipline of controlled experiment; but there is 
still need to emphasize an antinomy inherent in the practice of the laboratory, and 
statistical principles sometimes invoked unjustifiably in theoretical analysis of 
experimental data. 

When we compare two treatments we may proceed in two ways. One is to choose 
individuals for each treatment group at random on the explicit assumption that each 
such group is a representative sample of a homogeneous universe*. The other is to 
pair off individuals sharing common characteristics (e.g. age, sex, body build), or 
successive observations referable to one and the same individual, on the implicit 
assumption that each individual or observation referable to one and the same 
treatment group is a unit sample from a stratified universe. If we anticipate that the 
outcome of such pairing will be a contrast so clear-cut as to dispense with the need 
for statistical validification, the design of a laboratory experiment or of a field trial in 
conformity with the method of stratified sampling has everything to commend it; 
and the inclinations of the experimentalist will commonly favour the procedure. 

* It is important to realize that we here speak of a universe as homogeneous, or contrariwise as stratified, 
in the statistical connotation of the terms. If we classify a population by age groups and take samples of 
predetermined size from each, we may legitimately speak of our sampling system and/or universe as stratified 
in a descriptive sense consistent with the possibility that the expected sample score is the same for each age 
group (sub-universe or stratum). If each stratum of the universe is indefinitely large and identical with 
every other one with respect to relevant parameters, the effect of successively combining unit samples from r 


such strata is statistically equivalent to that of successively taking an r-fold sample from any one of them. 
From the statistical viewpoint we then regard the universe as homogeneous. 
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What too few laboratory or field workers sufficiently realize is that: 
(a) the algebraic implications of stratified sampling are often intractable; 
(5) many prescribed statistical techniques are valid, only if we postulate a homo- 
geneous universe as the source of each sample available for comparison. 

In a later contribution in this series, we may have occasion to explore this aspect 
of the design of a trial more fully; and to clarify the proper choice of a technique of 
validification with special reference to circumstances in which pairing is or is not con- 
sistent with its essential postulates. In any case, we have still to consider the rival claims 
of alternative methods of validification. These we may provisionally distinguish as: 

(a) decision tests, ostensibly devised to adjudicate upon the merits of particular 
hypotheses ; 

(bh) techniques of estimation, the aim of which is to make legitimate statements about 
a parameter (or parameters) of a universe or sub-universe on the basis of 
information contained in a sample. 

At this stage, we intentionally sharpen a distinction which we shall refine in a later 
communication. Here it suffices to say that we might alternatively define (/) as setting 
limits to what hypotheses are admissible on the basis of a set of observations. The 
choice of the word merits in preference to truth is also deliberate, inasmuch as it is 
necessary to make a rough and ready distinction* between different targets of statistical 
reasoning in the domain of test decisions: 

(i) to decide whether to regard a particular hypothesis as correct or false: 
(ii) to limit the risk incurred by rejecting it, if it is indeed correct. 

As we shall subsequently make more explicit, the limitation of risks incurred by 
rejecting either of a particular pair of hypotheses if true gives no assurance of a high 
probability of decision in favour of a third one which happens to be correct. It is there- 
fore useful to make a broad distinction between statistical inference as conditional or 
unconditional. With this end in view it is here fitting to make the meaning of the 
term statistical inference as explicit as need be. In the most exacting sense of the 
term, and as the writers would prefer to use it exclusively, statistical inference has as 
its end in viewy an unequivocal assertion coupled with a numerical specification of 
the long-run frequency of its truth within an assumed framework of indefinitely 
protracted repetition. We may speak of this numerical specification as the uncertainty 
safeguard, and may then distinguish the two types of statistical inference mentioned 
above as follows: 

(1) unconditional, if the uncertainty safeguard unconditionally specifies the probability 


of the falsity of the assertion: 
(11) conditional, if the uncertainty safeguard specifies the probability of the falsity of 


* Anscombe (1951) makes a three-fold distinction: 

It is worthwhile to distinguish different purposes one may have in accepting a hypothesis: (1) to base an administrative 
decision on, (ii) for further testing and confirmation, (iii) for acceptance into the corpus of scientific knowledge, to be 
relied on in future work. There are risks, variously assessable, in coming to decisions in all three cases. For example, in 
case (iii), if the hypothesis is later found to be seriously false a lot of effort in investigating other points may have been 
wasted. Just as with prior confidences, risks are rather vague in magnitude, but in a formal theory it would be tempting 
to postulate a complete numerical risk-function. 


+ Some contemporary writers use it in a more extended sense, embracing situations in which the 
conditional framework of (ii) involves a specification referable to the data themselves, and definitions of 
probability referable to introspective judgments unrelated to the long-term frequency of correct assertion. 
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the assertion within a framework of explicitly stated conditions independent of 
the data. 

In symbolic form we may express the unconditional probability of false assertion 
as P, — (1 — P,), and the conditional probability of false assertion within the frame- 
work of a particular Hypothesis A as P,,, = (1 — P,,,). Thus an assertion of the form 
P, — x is an example of unconditional statistical inference. Besides these two forms 
of statement we may make one of the form P, > x. At the epistemic level of communi- 
cation (Meredith, 1951), the more exact assertion, P, — 0-95, has no priority over 
the less definite assertion, P, > 0-95; and we may prefer to regard a statement 
expressed in the form P, > x as an example of (I), if we deem (1 — x) to be an 
acceptable level of uncertainty. On the other hand, it serves no useful purpose to 
make an assertion of the form P, > 0-30 if we regard any figure above 5 per cent. as 
an inacceptable level of uncertainty. Hence we shall have no practical interest in 
stating an inference of the form P, > x unless we should be content with the assertion 
P, — x. Otherwise, any useful statement of statistical inference we may undertake 
conforms to (II). 

In a subsequent communication we shall draw attention to neglected possibilities 
of the use of estimation procedures for the validification of evidence supplied by 
prophylactic or therapeutic trials. In this one our aim is to dispel widely current 
misunderstanding concerning what conclusions we may legitimately draw from the 
outcome of statistical tests in general use. Accordingly, we shall now examine the 
impact on statistical theory of considerations traceable to the practice of quality 
control in commerce and manufacture. Until the works of Wald (1947) and Neyman 
(1950) were published, the new concepts with which we shall here concern ourselves 
were only accessible in such highly specialized publications as those of Neyman (1937), 
Wald (1942), and Eisenhart and others (1947). Even among mathematical statisticians, 
probably only a few have as yet fully digested how radical a revision of widely current 
dogmas concerning the credentials of test procedures the new concepts must 
inevitably provoke when there is wider recognition of their logical implications. 
Since their implications are indeed of the utmost importance to a proper evaluation of 
universally current test procedures in the domain of experiment, we shall not hesitate 
in what follows to make use of numerical examples and simple models to get their 
true meaning into focus. 

While we do not deny the legitimacy of a different approach to our theme in the 
context of the research laboratories of a commercial firm concerned with the manu- 
facture of drugs (vide p. 114 infra), we here conceive our end in view as making 
an unequivocal statement with reference to the relative merits of Treatment A and of 
Treatment B. \Inescapably, some sort of uncertainty safeguard should accompany 
such a statement; and the attitude we adopt to the theory of probability will dictate 
its form. In any case, we here assume that the uncertainty safeguard is expressible as 
the probability that our statement is false; and that the measure of this probability 
is referable to a limiting ratio realizable in an infinitely protracted series of trials 
carried out within the same framework. In previous remarks we have drawn a 
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clear-cut distinction between conditional and unconditional inference. More precisely, 
four variants of any safeguard conceived in statistical terms, are open for discussion. 
We may be able to cite: 
(a) without restriction the probability that our assertion is false. 
(b) without other restriction that the probability of false statement so defined does 
not exceed an upper limit. : 
(c) separately probabilities of false assertion each conditional on one of a compre- 
hensive set of prior hypotheses. 
(dd) separately also probabilities of false assertion each conditional on one of a set 
which is not comprehensive. 


2. HEeuRISTIC VALUE OF THE DUAL HYPOTHESIS 

For the past generation research workers engaged on agricultural trials, tests of 
the efficacy of therapeutic or prophylactic measures, sociological field work, and 
bio-assay have relied for validification of their results on tests devised by Fisher (1925) 
and his co-workers in conformity with a familiar pattern. Such tests entail: 

(a) the invocation of a unique so-called nu// hypothesis which prescribes the 
frequency with which a sample score will lie outside a prescribed limit (or limits); 

(b) the specification* of a criterion of rejection, i.c. the convention to reject the 
hypothesis if the sample score does in fact lie outside the prescribed limit(s). 

Customarily (and oddly) the corresponding limiting frequency adopted is at the 
95 per cent. (approximately 20:1 odds) level; and the possibility of defining it in such 
terms resides in the fact that the unique hypothesis chosen for the purpose has an 
assignable distribution function. 

From one viewpoint, the prevalence of the fashion referred to is understandable. 
The text of Fisher (1925) prepared the way for manuals by Snedecor, Tippett, 
Hagood, Quenouille, and others, exhibiting schemata for computation in conformity 
with the Fisher test prescriptions. By recourse to a wealth of exemplary material 
the research worker willing to take the test prescription on trust can therefore readily, 
it may be all too readily, select a type of specimen at least seemingly like his or her 
own problem. None the less, there must be among those who do so, not a few who 
have felt misgiving for any (or all) of several reasons, notably the following: 

(a) not infrequently the form of the null hypothesis is irrelevant to the main issue, 
as when the decision that two treatment procedures have different results is of 
trivial interest in comparison with the decision that Treatment B is at least so 
much more effective than Treatment A; 

(b) the type of decision which concerns the investigator determines the choice of 
a particular null hypothesis far less than considerations of algebraic convenience 
vis a vis the specification of a sample distribution; 

(c) the test prescription takes no stock of any alternative hypothesis which may 
indeed be the main concern of the investigator, as when the disadvantages of 





* In a later communication we shall have occasion to emphasize a difference between the opposing 
views of test procedure advanced by Fisher and his adherents on the one hand, and by the school of Neyman, 
Pearson, and Wald on the other. In the terminology of Meredith (1951), their concepts are referable to 
different epistemic levels, Fisher’s to weighing evidence (already obtained), that of the others to prescribing 
rules of inference in conformity with a prescription independent of the observations. For this reason, 
Fisher’s so-called exact 2 ~ 2 test lies outside the scope of the concepts discussed in Section 3 below. 
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dismissing a new treatment if more efficacious outweigh the propriety of cautious 
adherence to an established procedure. 

The first misgiving has special reference to the domain of estimation, and as such 
to the theory of interval estimation specially associated with the name of Neyman 
(1937). Its relevance to the conduct of prophylactic and therapeutic trials will be 
the theme of a subsequent communication. The second and third raise issues which 
a theory of test procedure also advanced by Neyman and Pearson has brought into 
focus; but up to the present their critique of the unique null hypothesis has exerted 
little influence on research workers outside America. This is less because their 
writings lack the polemic vitality of their predecessors than because the concepts 
invoked are logically subtle and on that account difficult to assimilate unless examined 
against a background of familiar material. What follows claims to little originality, 
the aim being to help the laboratory or the field research worker to recognize pitfalls 
in previously accepted test procedures and to materialize some of the essentially 
novel concepts of the Neyman-Pearson approach. 

One way of doing this is to formulate a biological problem which involves no 
predilection for a single hypothesis such as the nu// one in virtue of algebraic conveni- 
ence as such. Accordingly, we shall think of a culture of Drosophila containing normal 
females and females which carry a sex-linked lethal gene. With Bacon we shall concede 
that nature is more diverse in her operations than man in his conceptions, but our 
knowledge of laboratory conditions (presumptively highly standardized) will justify 
the provisional assumption that any such female fruit-fly with an excessively large 
number of female offspring will in fact be either an entirely normal female or a lethal 
carrier. That is to say, we exclude such a contingency as the possibility that there is 
an endemic rare virus disease more fatal to male than to female larvae. We may then 
with justifiable assurance postulate two hypotheses about any female in the culture: 

Hypothesis A: The female is normal, in which event the probability that any one of 
her offspring will be female is p, — 3. 

Hypothesis B: The female is a lethal carrier, whence the probability that any one 
of her offspring will be female is p, — 3. 

We shall now suppose that a particular female has 144 offspring, and examine 
the current theory of test procedure when the end in view is to decide whether we 
shall adopt one or other hypothesis. Our initial concern will thus be with what the 
test prescribes, and as such has no necessary connexion with whether it leads us 
to a correct decision. We first note that each hypothesis equally prescribes for 
144-fold fraternities referable to a single fly mother the long-run frequency of such 
as respectively contain 0, 1, 2,.... 143, 144 females. We may in fact specify the 
relevant parameters thus: 





Probability that Mean No. of S.D. of Score 

Hypothesis Size of Sample any Single Offspring) Females in Sample Distribution 
(fraternity) is Female Fraternity of Sample 

A 144 Pa = 4 Ma = 72 Oa — 6:00 

B 144 Pb = § Mb = 96 ob — 5-66 
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From an algebraic viewpoint neither hypothesis specified above has anything to 
commend it as preferable to the alternative; but we may lazily and arbitrarily agree 
to consider first of all the consequences of adopting Hypothesis A as the null hypothesis 
in the traditional sense, if only because laboratory and field workers would commonly 
do so in a comparable situation. Lazily and arbitrarily also, we shall first adopt a 
modular criterion of rejection or acceptance for the same reason, i.e. we shall reject 
the hypothesis chosen unless the number of females x is such that: 


I(x — M,)| < X, 


In conformity likewise with current convention, we shall choose the score X, so that 
the probability («) that x will lie in the critical region, i.e. outside the range specified 
above, is about 0-05, if the null hypothesis correctly describes the situation. For 
samples of 144 and values of p, (or p,) anywhere (as in this example) within the range 
0-1 to 0-9, the normal integral gives an adequate quadrature at the so-called sig- 
nificance level 95 per cent., if we make the appropriate half-interval correction. If 
we choose X, = + 12-5, so that (x — M.,) ‘| X, when (x — M,) ~ + 2-080 the 
table of the normal integral sets % ~ 0-038. We have then made the decision to 
regard a female with 144 offspring as normal if the number of her female offspring 
lies in the range of 60 to 84 inclusive and to reject her claims as such, i.e. in this 
context to regard her as a lethal carrier, if her female offspring number more than 
84 or less than 60. 

In the Neyman-Pearson theory of test procedure we speak of % as the conditional 
probability of making an error of the first kind. Now the cited value of «% (~ 0°038) 
correctly assigns the probability of rejecting the null hypothesis only on the 
assumption that the latter is true, /.e. that the mother fly is normal. This we do 
not know, the aim of the test being to throw light on the alternative possibility. 
If we carry out the rule of the test consistently, we shall sometimes make an error of 
the first kind, i.e. reject normal flies as such and by the same token wrongly identify 
as carriers flies which are indeed normal. Conversely, we shall sometimes apply the 
test to flies which are indeed carriers. If the number of females among their progeny 
lies within the range 60-84 inclusive we shall reject them as such. We shall then 
wrongly accept the null hypothesis. This is the error of the second kind, which we 
make in this context if the relevant parameters of the appropriate distribution are: 

pm = §, M, 96, and o, = 5-66. 
With due regard to the half-interval correction, the region we then exclude is from 
59-5 to 84-5, bounded by (x — M,) 36:5 and (x — M,) 11-5, i.e. 
(x — M,) =~ —6-40, and (x — M,) ~ —2-030,. Since the area of the normal 
integral of unit variance from — ~ up to — 6-4 is utterly trivial, we make no sensible 
error if we say that the consistent application of the rule leads us now to reject carriers 
as such with a probability (8) assigned by the area of the the normal curve of unit 
variance in the range from — « to — 2-03. One speaks of this loosely as the probability 
of making an error of the second kind; and the table of the normal integral in this 
case cites the value 6 ~ 00-0212. More explicitly, 6 is the conditional probability 


? 
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of accepting the null hypothesis, if it is false; but we have as yet said nothing about 
how often it will be. 

In short, the only information we have at our disposal so far bears on the 
probability («) of rejecting the null hypothesis when it is true, and that (f) of rejecting 
the alternative when the latter is true, i.e. of accepting the null hypothesis when it is 
false. If we now suppose that we actually know the proportion of normal and carrier 
females in the culture, we can take our analysis a decisive step forward. We shall 
assume that the culture consists of 500 mothers, of which 450 are normal and 50 are 
carriers. If we then choose at random* any single fly with 144 offspring as a test 
subject we may say that: 

(i) P, — 0-9 is the probability that it will be normal, i.c. the probability that the null 
hypothesis is applicable to the test subject; 

(ii) Py, = O1=—( P.) is the probability that it will be a carrier, i.c. that the 
alternative hypothesis is applicable to the test subject. 

We now have all the relevant data for a statistical specification of the long-run 
frequency of all four possible results of the outcome of the test: 

(1) The fly is normal and we rightly accept it as such: 
P.(1 — a) = (0-9) (0-962) — 0-866 
(2) The fly is normal and we wrongly reject it as such: 
P,.% — (0-9) (0-038) — 0-034 
(3) The fly is a carrier and we rightly accept it as such: 
P,(1 — B) = (1 — P,) (1 — B) = (0-1) (0-979) = 0-098 
(4) The fly is a carrier and we wrongly reject it as such: 
P,.B8 —(1 — P,) B — (0-1) (0-021) — 0-002 

To each assertion consistent application of the rule leads us to make we may 
thus assign a probability that it will be true or false. We may then set out a balance 
sheet as follows: 





Assertion True Assertion False 
Null Hypothesis True Pal %)  0°866 Pa. gq — 0:034 
Null Hypothesis False (I Pa) (I B) — 0-098 (1 Pa) B = 0-002 
: Total aa a l B— (a B) Pa — 0-964 B+- (aK B) Pa — 0-036 





In conformity with the definition given above, we may legitimately speak of the 
probability (P,) of making a correct decision, or the probability (P,) of making a 
false one, by consistent application of the rule, in which case our balance sheet yields: 

P,=1— B-—(a— BP, =%-4 percent.  __..... (i) 
P,= B+ (a — B)P, 3-6 percent. j§=§= . = wees. (ii) 

For reasons we shall come to later, the outcome of our choice of a rejection 
criterion is here vastly more encouraging than need be in most situations; but we can 
do better. We have lazily adopted a modular criterion because laboratory and field 


* The effect of the lethal gene on the fertility of the fly introduces a bias for which we can allow, and one 
which we may therefore deliberately neglect for heuristic purposes. 
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workers commonly do so, regardless of the end in view, when the sample distribution 
prescribed by the null hypothesis is symmetrical. Now fraternities of 144 flies of 
which less than 60 are females will be vastly less common if the mother is a carrier 
than they would otherwise be. It would thus seem to be more reasonable to restrict 
our attention to families with an excessive number of females. We shall now therefore 
adopt a vector criterion, i.e. reject as normal only mothers with more than 84 
female offspring, we exclude only one tail of the approximately normal distribution 
and halve our error of the first kind, i.e. set % — 0-019. For reasons stated this 
does not materially affect the value of §, since the chance that a carrier will have less 
than 60 females among 144 offspring is negligible. If we then say that we shall reject 
the null hypothesis at the vector level +2-O8o0 in contradistinction to the modular 
level + 2-080, we now put x = 0-019 but 8 — 0-021 as before. Whence our balance 
sheet summarized by Equations (/) and (ii) becomes: 
P 


A. Modular Criterion. Reject the null hypothesis if: P 


0-981 —— 98-1 per cent. 
, — 0-019 1-9 per cent. 
59 That the adoption of the 
x ~ 0:038 ae i ‘ 
aan vector criterion does in fact 
give a better prognosis of correct 
decision is not surprising; and 
Fig. | sufficiently exhibits why 
this is so in the situation under 
discussion. 

At this stage we may also 
note with profit an interesting 
consequence of (/). If % ~ Bso 
that! — 8 — 1— mand (a — f) 

0, Equation (/) reduces to 
P,- (1 — a). Within the frame- 
work of our assumptions that 
0-021 there is only one admissible 

alternative to the null hypo- 

thesis, so that P, — (1 FD, 

we can assign a value to the 

845 long-run frequency of correct 
decision based on consistent 
application of the rule without 
any prior knowledge (P,, or P,) 
of the population at risk if we 
define our rejection criterion in 
such a way as to equalize the 
845 probabilities of errors of the two 

Fic, 1.—Testing two hypotheses. kinds. Wecanthen predetermine 


Null Hypothesis A. P; 5 
Alternative B. Pp 2 that the value of ~ may be as 
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small as we care to make it by prescribing a sample size sufficiently large. Needless 
to say, this presupposes the possibility of defining the distribution function of the 
single admissible alternative hypothesis. 

Within the same framework of assumptions and in the same model set-up, let us 
now explore the effects of making our criterion for rejecting the null hypothesis more 
exacting, in the sense that our error of the first kind is less. Thus we shall decide to 
accept a female fly with 144 offspring as normal if (vector criterion) she has 88 or 
less female offspring and deem her (rightly or wrongly) to be a lethal carrier if she 
has 89 or more. We then set the decision limits on either side of x — 88-5, in which 
event: (x — M,) = +2:-750,, and (x — M,) | -330,. 

Whence from the table of the normal integral we derive ~ — 0-003 and 6 = 0-092. 
If we paint in'these values in Equation (/) we get: 

| — B = 0-908 and (~ — B) P, 0-080 
*, P, — 0-988 or 98-8 per cent. 
In this situation little advantage (98-8 as against 98-1 per cent.) accrues from making 
our criterion for rejection of the null hypothesis more exacting; but we have chosen 
our null hypothesis as the hypothesis with greater prior probability since the culture 
contains a great excess of normal flies. Let us then reverse the situation by postulating 
that the culture contains 450 lethal carriers and 50 normal among 500 female flies 
in all, i.e. P, — O-l and P, — 0-9. In this case (x — f) P, 0-009, so that 
P, — 0-908 + 0-009 — 0-917 or 91-7 per cent. 

If the null hypothesis is referable to the sma//er population at risk (/.e. if it has lower 
prior probability than the alternative) the effect of making the rejection criterion 
more exacting is thus to /ower the probability of arriving at a correct decision. 

Before discussing how far this rule is of general application within the framework 
of our model assumptions, let us take stock of another highly relevant variable, 
viz. sample size. For a fixed size of sample the foregoing results have sufficiently 
emphasized what a visual diagram suffices to demonstrate (Figs 2-4), i.e. we cannot 
decrease the conditional probability («) of an error of the first kind without increasing 
the conditional probability (8) of an error of the second kind and vice versa. It is 
also of importance to appreciate that we can make f for a pre-assigned value of « as 
small as we wish to make it, if we make the size of the sample large enough. Conversely, 
we can keep x at a pre-assigned level for a smaller sample only by making § larger. 

Consider for example the consequence of applying the foregoing test to fraternities 
of 100, so that M, — 50, o, — 5, M, — 66-6, and o, — 4-71. If we make the vector 
criterion of acceptance or rejection conveniently near the 20 level, we may set it on 
either side of a score level 60-5 in which event (x — M,) is approximately 2-lo, and 
x = 0:036. If so, (x — M,) is approximately —1-320, and 8 ~ 0-0934. The 
value of ~% here agrees as closely as need be for exemplary purposes with the value 
chosen (% 0-038) for the 144-fold sample when 8 ~ 0-021. Thus the effect of 
reducing the sample size is to increase more than 4-fold the probability of an error 
of the second for a corresponding probability of error of the first kind. 

In this case we can make our two error risks nearly equal by setting our limits of 
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r+ 28 


rejection and acceptance for the 
null hypothesis on either side 
of the score x = 58-5; in which 
event the null hypothesis sets 
the upper limit of acceptance at 
+1-7o, and the alternative sets 
the lower limit for rejection at 

1-740,. Thus ~ ~ 0-045 and 
B~ 0-041. If P, = 0-9, as in 
our first example, P, ~ 95-9 per 
cent. For the 144-fold sample 
we have P, = 98-1 per cent. 
when the two conditional risks 
are nearly equal. 


Fic. 2.—Testing alternative hypotheses. 
Hypothesis A. Mean(M) ~— 19-8,0m — 5 Before we take the next one, 
Hypothesis B. Mean(M) — 36:2,0m —- 5 


We made the arbitrary decision 
to designate as our null hypo- 


Criterion of acceptance for Hypothesis A x = 





we may well retrace our steps. 


32°1 
Error of first kind (rejecting 
Hypothesis A when it is 


thesis the assertion that the crue) J 
mother fly is normal. Actually, Probability 
z~ 0-007 


we have given no reason for 
doing so; and we may pause at 
this stage to dispose of a mis- 
conception widespread among 
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those who carry out routine false) WS 
tests within the framework of sag 
the unique null hypothesis. pot 

There is prevalent a somewhat -Xp* 0-820 i 

naive view that we choose our 

null hypothesis as a safeguard x=321 M=362 


against wishful thinking, and 
that we make accordingly our 
criterion of rejection as exacting 
as need be. On such a view our criterion of rejection is at best a disciplinary conven- 
tion; and as such has nothing to do with unconditional statistical inference. Also, one 
can justify it as such only if one chooses the null hypothesis on the understanding that 
one wishes to fall backwards in preserving one’s rectitude, /.e. if the null hypothesis is 
actually the one the investigator has reasons for believing to be false. Evidently no 
recipe that the best Mrs. Beeton can prescribe will indeed meet one’s requirements in all 
situations. If experiments on laboratory stocks have convinced the investigator that 
a new therapy is preferable to a current procedure, the enthusiastic research worker 
will not reasonably impose on the null hypothesis a criterion of rejection as exacting as 
that of the investigator undertaking experiments to test the credentials of extra-sensory 


Fic. 3.—Testing alternative hypotheses. 
Hypothesis A. M — 19-8, 6m 


Hypothesis B. M — 36:2,0m 
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perception. In conformity with 
current procedure, he will never- 
theless invoke a null hypothesis 
of the same type in either situ- 
ation, and with the same con- 


Criterion of acceptance for Hypothesis A x — 23°9 
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vention (e.g. 0-05 significance 
level) of rejection, ifaccustomed 
to rely on current cookery book 
recipes. The reason is that the 
cookery book recipe will com- 
monly prescribe as the appro- 
priate null hypothesis the one 
which commends itself to the 
mathematician for reasons 
which have nothing to do with 
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the operational intention of the 
scientific worker. 

In the model situation dis- 
cussed hitherto, we have, in fact, 
sidestepped the temptation to 
choose our null hypothesis for 
this reason, since it would be 
equally easy to adopt as such 
the postulate that the fly mother 
is a lethal carrier. A_ rejection 
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Hypothesis A when it is 


criterion identical in terms of 
the conditional risk of error of 


the first kind, as is indeed the 
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tent with those we have so far explored. 


Alternative hypotheses involving nearly 


most we can specify within the 
framework of a unique null 
hypothesis, would then lead us 
to results numerically inconsis- 


equal 


The reader may check this assertion by 


reversing the role of the two hypotheses in the foregoing examples. 

Partly because of the size of the samples chosen, previous tests in our model 
situation have led to a high probability of correct decision arrived at in conformity 
with traditional procedure, i.c. within the framework of the unique null hypothesis. 
This will lead us to a totally wrong view of what we can rely on it to accomplish, 
if we fail to take stock of two background conditions plausibly invoked in the 
prescribed set-up, but rarely admissible in other situations: 


(a) we concede only one admissible alternative to the null hypothesis; 
(b) we have postulated a complete specification of the sampling distribution in 


terms of the alternative thereto. 
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It will be simpler, if we first examine the implications of (4). In all examples 
hitherto cited, we have found that P, > 0-5, i.e. that more than 50 per cent. of our 
decisions will be right if we consistently follow the last prescription, in which event 
we shall be more often right than wrong. Now there is no reason why this should 
be true of our situation, other than the fact that we can here fix in advance the 
size of the sample and the criterion of rejection or acceptance for the null one with 
due regard to the value of the relevant parameters of both hypotheses. To see the 
relevance of the consideration last stated, let us now replace the female carriers of 
a sex-linked lethal gene by females with a virus infection to which their male offspring 
succumb somewhat more readily than their sister flies. We shall postulate a sex 
ratio of 11:9 in favour of females among the progeny of infected mothers. Our 
alternative hypothesis is now that p, — 0-55. 

On the new hypothesis, which we again provisionally assume to be the only 
admissible alternative to the null one (p, = 3), we have M, — 55 and o, ~ 4-98 for 
fraternities of 100. If we set our rejection criterion on either side of x — 60-5, we 
have as before 

(x M.) +2:-lo and (x M,) +1-lo, 


whence a = 0-018 and B = 0:864. 
Thus (1 f) = 0-136 and (~% — Bf) 0-846: 

whence, from Equation (/), 

P, = 89-7 per cent. when P, — 0-9 


P, =~ 22-0 per cent. when P,, — 0:1 

This example illustrates the important role of P, and P, = (1 — ~P,) which define* 
the populations at risk under each hypothesis. If P, — 0-1, we have P, — 22 per cent. 
and P, = 78 per cent., i.e. consistent application of the rule will lead us to be wrong 
more often than right when the sample is as small as 100, but we can fix the size of 
the sample to ensure that P, ~ } only if we can assign at least a lower limit to P.,, 
and then only if we can assign a value to 8. Now we cannot assign a value to f unless 
we can specify the appropriate parameter (in this case p,) or parameters of the 
sample distribution of the admissible alternative hypothesis. In any case, it will 
rarely happen that we can as easily conceptualize the meaning we here confer on P.,, 
and still more rarely that we can equip it with a numerical value. 

The considerations last stated do not exhaust limitations of current test procedure 
within the framework of the unique null hypothesis. We have so far assumed a 
single admissible alternative, and we can very rarely make such an assumption 
with propriety. Indeed, much statistical enquiry goes on against a background of 
an infinitude of admissible alternatives to any null hypothesis we may advance. 
In our model set-up we have already adumbrated this complication. So we may now 
with profit postulate three admissible hypotheses. We shall then get into focus a 
general conclusion concerning the utmost we can legitimately infer in the domain 


* The parameters Pa, Ph above, and Pe in what follows, represent the prior probabilities of Bayes’s 
theorem. The assumption of their equality when we do not know their values (Bayes’s scholium) is 
indeed the vulgar error of neglecting the population at risk. 
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of unconditional assertion from the outcome of a decision test referable to a unique 
null hypothesis in the absence of background information concerning admissible 
alternatives thereto. We may anticipate it by reinterpreting Equation (ii) above. 
When /} > & in Equation (i/), (% — £) is negative and P, > f, i.e. the uncon- 
ditional uncertainty safeguard is no greater than the greater conditional risk. 
When 6 < %,(% — f) is positive and P, > x, i.e. the unconditional uncertainty 
safeguard is no less than the smaller conditional risk. 


The conclusions last stated refer to a situation in which only two hypotheses are 
admissible. An examination of a model situation in which three hypotheses are 
admissible will exhibit it as a particular case of a general proposition of limits, 
expressible as follows, on the assumption that our concern is with the long-run 
proportion of the true assertions we make on the basis of a test procedure of the type 
under discussion: 

(1) The worst that can happen is that we shall exclusively encounter situations 
prescribed by the hypothesis associated with the greatest conditional risk (p,) of 
erroneous rejection. 

(2) The best that can happen is that we shall exclusively encounter situations pre- 
scribed by the hypothesis associated with the smallest conditional risk (p,) of 
erroneous rejection. 

(3) Accordingly, our uncertainty safeguard P;, will be within the limits p, to py and 
this means that we can set an explicit upper limit to it, if knowledge of the range 
of admissible hypotheses permits us to assign a value to peg. 

Our new model will be that the following three hypotheses are admissible with 
reference to our fruitfly culture: and we therefore start by assuming that we can define 
the relevant parameters and sample distributions referable to each of them: 


Hypothesis A: The female is normal a - oe ay Pa : = 30 
Hypothesis B: The female transmits a virus infection... ~ Po = 44 = 33 
Hypothesis C: The female carries a sex-linked lethal gene ss De 3 = 40 


We shall designate the proportions of the three types of female fruitflies as 
respectively P,, P,, P., so that (P,, + P, + P.) = 1. In real life there is no reason to 
exclude the possibility that a lethal carrier could transmit the virus; but we assume for 
argument that such flies are 100 per cent. resistant to it. We are free to select any one 
of the above as our null hypothesis; and shall first assume that our null hypothesis is 
p —p,. In that event we may choose a single rejection criterion x > x,., thereby 
fixing the conditional risk of rejecting the null hypothesis when true as «, that of 
rejecting Hypothesis B when true as f§, and that of rejecting Hypothesis C when 
true as y. The unconditional probability of an error of the first kind is xP,. We 
accept the null hypothesis erroneously, if we either reject Hypothesis B when true 
or reject Hypothesis C when true. Whence the probability of wrong acceptance is 
BP, + y P., and the probability of making a false decision of one sort or the other is: 

Pp=aP,+BP,t+yPo = =  —— weeee (iii) 
This is equivalent to: 

P,=(1—P) = 1-—eaP, — bP, — v?P.- 
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If our null hypothesis is p = p,, our rejection criterion must make y < f since p, 
lies nearer to p, than does p,. If we define it so that x — B (whence % > y) we may 
write (///) in the form: 
=1—a(P,+P,) —vP. 

l1—a(l —P.) — yP.. 
,=1—at+(G— vy) P.. 
Since % > y, on the assumption that % — £,(% — y) is positive and 

SS a rr ees (iv) 

We have here arbitrarily chosen as our null hypothesis p — p,. In principle, the 
procedure would be alike if we chose as our null hypothesis p — p,. If we choose 
the second hypothesis, it will be different, because we must now adopt a modular 
rejection criterion, i.e. we reject Hypothesis B in favour of Hypothesis A if x < x,,, 
and reject Hypothesis B in favour of Hypothesis C if x > x,.. We shall then denote 
by §, the probability of rejecting the null hypothesis when true in favour of 
Hypothesis A, and by §. that of rejecting the null hypothesis when true in favour of 
Hypothesis C. The probability of erroneous rejection is then P, (8, + 8.). The proba- 
bility of erroneous acceptance is (xP, + yP.), and P,— P,(8,+ 8.) +aP,+yP.. 
If we choose our two rejection criteria so that x — £,, and y — £., we then obtain 

P, =(P, + P,) B, + (P, + P.) B, 
(1—P)B8, + (1 — P,)B8.- 
"PP, <(B, + B,). 
pe Ae | ee eS 5 (v) 

Since the conditional risk of an error of the first kind is (8, + £8.) — £, the choice 

of Hypothesis B as our null hypothesis leads us to a result equivalent to Expression (iv): 
ie. F 2i— fp 

Thus we can always choose our rejection criterion or criteria to make the uncertainty 

safeguard no greater than the conditional risk of error of the first kind, when either 

of only two alternatives to the null hypothesis is admissible, each being also discretely 

specifiable. With the same reservation we can make the probability of correct decision 

as near to unity as we care by making the size of the sample appropriately large. 

So far we have presumed a backstage view of the situation, /.e. that we can in 
fact specify precisely each admissible alternative to the null hypothesis; but it will rarely 
happen that we can in fact do so. If we cannot specify either admissible alternative, 
we must retrace our steps, recalling that the form of the uncertainty safeguard does 
not depend on the choice of a modular or vector criterion of rejection. If p, is the 
conditional risk of rejecting Hypothesis H when true, we may write it for any three 
hypotheses with discrete parameters po, P,, P> as 


h=2 


eR —st wg (vi) 
h-O 
h=2 

In this expression, aM) nwt es (vii) 
h-0 


We shall denote by p, the greatest and by p, the smallest values of p, without assuming 
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to which hypothesis either is assignable. We can then express either of the remaining 
values as: 

Pe Chg and Ps + Chs 
By definition, each value of ¢, , and e, , in the above is either positive or zero. On this 
understanding we may write our uncertainty safeguard alternatively as: 


h=2 h=2 
F P, (Pp, Che) P, a P, (p, + Ch). 
h—O h=—0 
h=2 h=2 h=2 h=2 
. , D ? 
. - Pe > P,, Fi Cas P, Ps > P, + a Pg+ Cas 
h—0 h=0 h=0 h=0 
hk=2 h=2 
. Pe P P, ° Che P, Ps + P| P, . Ch s 
h-0 h-O 


"Py 2 Py > ps, 

We may now generalize the utmost we can legitimately infer from the performance 
of a decision test within the framework of a unique null hypothesis, as we have been 
accustomed to perform it, i.e. against the background of an unknown, if finite, 
number (m) of admissible alternatives to it, and with no precise specification of any 
one of them. As before, we shall denote by p, the probability of rejecting the Ath 
member of the m-fold set when it is right, and by p, the corresponding risk of rejecting 
when right the hypothesis arbitrarily chosen as the nu// one ( p — py), so that 


hom 


P, PS P, > Pr 


h=0 
Consider now the hypothesis for which the risk of rejection (p,) when true is greatest, 


so that e, , is positive if p, = (p, — e,,,) and e,, = 0. 
hom h=m 
Pr = Ps -. P, > Pa + Ch.gs 
h-O0 h-O 
hom 
? , 
Pe > P, " Chg 
h O 
ae eS Ae ee ee (viii) 


If p, is the smallest value of p, so defined, we may write p, — (p, + e,.), in which 
¢,, IS again positive and e,, — 0, so that: 


hom hom 
P, Ps PS P, - oa P, : Chs: 
h-0 h-0 
eee | A Ey re ee (ix) 


The situation last discussed suffices to get into focus the most we can say about 
the outcome of a single hypothesis decision test within the domain of unconditional 
inference, when, as commonly, we can neither state how many alternatives to it are 
each admissible nor specify each in numerical terms. If we denote by p, and p, 
respectively the least value and the greatest value of the probability of denying, in 
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conformity with the rejection criterion chosen, any one of the set of hypotheses when 
true, the probability of a false verdict lies between limits definable as: 

eee ie (x) 
Unless we can specify the parameters of each admissible alternative to the hypothesis 
chosen as the nu// one, we can say no more about p, in Expression (x) than it is less 
than unity; but if we can specify each admissible alternative hypothesis precisely, we 
can choose our rejection criterion to maximize the probability («) of rejecting the null 
hypothesis when true, so that x = p,, wnence P,< a. By appropriate choice of 
sample size we can then make the probability of a correct verdict on the null hypothesis 
as near to unity as we like, without invoking any information witi reference to its 
prior probability (Neyman and Pearson, 1933). We thus arrive at the following 
conclusion: the decision test procedure may be informative in the domain of uncon- 
ditional inference but if, and only if, we can precisely specify each of an exhaustive and 
exclusive set of hypotheses. 

This vital restriction calls for further comment. A parameter p,, definitive of an 
admissible alternative to the null hypothesis (p — py), may be indefinitely close 
to py itself. If so (Fig. 5), p, ~ 1 — po for samples of finite size; and we can gain 
no appreciable advantage by making p, — py. This consideration has an important 
bearing on the concept of fest power touched on below. Here it is relevant because we 
can rarely be certain that no such hypothesis alternative to the one chosen as null is 
indeed admissible. This raises a question of pivotal importance in connexion with the 
foregoing exposition: in what situations can one postulate an exhaustive and exclusive 
set of admissible hypotheses which fulfil all the relevant conditions now stated ? 


An important class, in which the postulate of an exclusive and exhaustive set of 
admissible hypothesis is legitimate, arises in pathology when we can: 
(i) classify test subjects as healthy or sufferers from a particular disease; 
(ji) assign a probability on the basis of laboratory experience to the assertion that 
a single test will fail to identify them as one or the other. 

For heuristic purposes a criterion for screening tuberculous patients cited by 
Neyman will serve as a type specimen*. On the basis of laboratory experience, we 
assume that a single x-ray film will: 

(a) fail to detect the disease in 40 per cent. cases when present; 
(b) give a positive result for | per cent. of healthy test subjects. 

Clearly we need to make more than one film, if we aim at a high level of satisfactory 
diagnosis. We shall assume that we take five and adopt as our test criterion the rule 
that we deem the disease to be present if at least one film is classifiable as positive. 
Our first, which we may call the null hypothesis (Hypothesis A), will be that the disease 
is present. Our test criterion leads us to reject the hypothesis if all five films are 
negative. The relevant parameter (p,) is the probability of failure, in this case 0-4. 
Hence the probability of rejecting the null hypothesis (i.e. wrongly classifying the 


* The medical specialist will recognize some arbitrary assumptions in the argument here advanced for 
illustrative purposes alone. 
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test subject as healthy) is 
a = (0-4)° 
0-01024 
The alternative hypothesis (Hypothesis B) is that the test subject is healthy. If so, 
the probability (p) of getting a negative result from one film is 0-99, and that of 
getting five negative results is (0-99). We shall reject the alternative hypothesis if at 
least one film is positive, i.e. 
B = 1 — (0-99) 
0-04901 

The reader will find it instructive to explore the outcome of different test criteria 
based on different sizes of sample (i.e. numbers of films per test subject) within the 
framework of the foregoing assumptions. In this context, P, in Equation (/), on 
p. 97, is the incidence of tuberculosis in the population. Our truth equation is 

P,=-1-—8-CG-HP, 
1 — 0-04901 — (0-01024 — 0-04901)P 
0-95099 + (0-03877)P.,. 

Thus the test criterion ensures a probability of correct decision in a little more than 
95-5 per cent. if the incidence of the disease is | per cent., and must inevitably ensure 
a figure more than 95 per cent. for an overall correct verdict. However, we shall now 
see that unconditional assertions may be of subsidiary interest in connexion with 
decisions of this sort, though the procedure illustrated may well have important 
applications in the domain of differential diagnosis. 


a’ 


3. DOMAIN OF CONDITIONAL INFERENCE 

In the foregc ing section we have examined a model situation, viz. a fruitfly culture, 
to throw light on the relevance of test procedure to unconditional inference, i.e. our 
concern has been to assign a probability to a correct decision for or against a hypoth- 
esis. On the assumption that the female deemed to be normal in this context is a new 
and valuable mutant, we might also formulate our problem in terms of conditional 
inference. Thus we may wish to curtail both the risk of letting lethal genes accumulate 
in our stock and the risk of destroying normal stock otherwise available for perpetua- 
ting it. Accordingly, we decide to screen our females by setting up a rejection criterion 
which will set an acceptable limit to the risk incurred in retaining a lethal carrier 
and an acceptable limit to the risk of losing an otherwise normal female which carries 
the mutant gene we seek to perpetuate. 

We can likewise, and usefully, regard the issue at stake in a diagnostic test, such 
as the one Neyman cites, as comparable to decisions which arise in quality control, 
in so far as the theory concerns itself with hazards respectively, though not 
quite appropriately*, designated producer's risk and consumer's risk. The main 

* Producer's risk is the risk of rejecting for sale consignments which do in fact attain the standard 
guaranteed (Acceptable Quality Level) by the manufacturer. Consumer's risk is the risk of releasing for sale 
consignments of a standard just at a level of inacceptability prescribed by the manufacturer. In a later 
communication, we hope to discuss the credentials of quality control techniques and concepts, and their 
relevance to the clinical trial, 
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preoccupation of the administration in the situation last discussed will in fact have 
less to do with an overall assessment of correct judgment than with the penalties 
of making mistakes of two sorts. To classify wrongly a tuberculous person is to 
deprive him or her of proper treatment. To classify wrongly a healthy person is 
likely to cause unjustifiable alarm and despondency. A test procedure which pre- 
scribes that neither risk exceeds what the authorities regard as admissible satisfies 
the practical demands of the situation from their viewpoint. We may state these 
demands explicitly in the form: 

(i) if the test subject is tuberculous, the risk of erroneous diagnosis must not 

exceed q ; 

(ii) if the test subject is healthy, the risk of erroneous diagnosis must not exceed f: 

Any unconditional statement we can legitimately make in this context presupposes 
the possibility of classifying the test subjects exclusively as of one or other type; but 
the administrative intention does not change, if we concede that an appreciable 
number of test subjects are unclassifiable by recourse to any available independent 
diagnostic criterion. The test need not then lead to consequences embarrassing to 
authorities content to disclaim responsibility for individuals unless definitely deemed 
to be healthy or tuberculous. Undoubtedly, there will arise in administration many 
comparable costing situations in which a conscientious claim for limiting the require- 
ments of a test to such conditional assertions is admissible; but the propriety of such 
a procedure in the domain of scientific research is at least open to debate. 

Recent literature on quality control techniques justifies the suspicion that some 
writers would advance the claims of conditional decision tests of this type as appro- 
priate in the domain of the prophylactic or therapeutic trial. It is therefore pertinent 
to examine the relevance of the analogy between the end in view of the salesman and 
that of the research worker approaching a clinical trial against the background of 
laboratory experiments in vitro or on animals. In this situation the investigator will 
lightly incur neither the risk of losing credit for major discovery nor that of being 
discredited by subsequent enquiry. If content to follow the practice of the large- 
scale commercial corporation, he will therefore invoke a test procedure which will 
set appropriate limits to the risk of wrongly rejecting the alternatives: 

(i) his own assertion that Treatment B guarantees ) per cent. more cures than 
Treatment A; 

(ji) the assertion of an imaginary critic that Treatment B guarantees only a per cent. 
more cures than Treatment A. 

By all too easy stages, statistical inspection then becomes a recipe for statistical 
careerism. The investigator and his putative opponent relinquish their proper relation 
as colleagues in the impersonal pursuit of truth to embrace a convention which 
safeguards the amour propre of each. The decision to make the best of a bad job in 
this sense involves an ethical issue which is not amenable to arguments likely to win 
universal assent; but it carries with it an implication which may well damp the 
enthusiasm of the convert. This will come into focus, if we here digress to clarify 
the Neyman-Pearson concept of fest power. 
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In the taxonomic domain of the two-class universe we specify « and f in the follow- 
ing way for the r-fold sample, when the criterion for rejecting Hypothesis A (p = p,), 
and hence for accepting Hypothesis B (p = p, > p,), is x > ft: 


x=F x=(t—I1) 

a= > woPa (1 — py *: i-8= 2 teh OA lesen (i) 
x=t x=0 
x =e 1) x=r 

B=S nop —py*s 1-B=S nop (—py" ..... (ii) 
x 0 x=t 


What Neyman calls the power function F(p) of the test for the same size (r) of 
sample, and for the same criterion score (f) is picturable as the graph of the following 
function over the range p = Otop = I: 


da tk ial ee (iii) 
x= 
It follows that: 
F(p,) — % and F(p,) | a ae re (iv) 
For given values of r and ¢, the condition that % = f is, of course: 
on See eed | | (v) 


Having fixed any criterion for rejection of the null hypothesis (Hypothesis A), 
and having chosen the alternative hypothesis (p = p,), we speak of F(p,) as the 
power of the test. This being (1 — §) is the probability of rejecting the null hypothesis 
when it is false, on the assumption that the alternative is the only admissible one. 
One test prescription is more powerful than another if it has a higher power in this 
sense for a fixed value of &, i.e. if it assigns a lower probability to error of the second 
kind for the same probability of error of the first. If the two test prescriptions both 
invoke the same distributions, the test which employs a larger sample must be the 
more powerful one. 

The reader will find it instructive to plot F(p) against p for the following example 
of a test procedure. The null hypothesis is that p — } when r = 144. The rejection 
criterion is x — 82-5. For the distribution prescribed by the null hypothesis the mean 
is 72 with o — 6. Whence the criterion score in standard form is (82-5 — 72) — 6 

1-75. This excludes 4 per cent. of the area of the normal fitting curve, i.e. x — 0-04. 
For this set-up we may tabulate as below: 





TABLE | 
X c F(p) 
p M (82-5 — M) Oo X=-@ B (1 — B) 
yo 60 22-5 5-9161 3- 8032 >0-999 <0-001 
M4 66 16-5 5-979 2-7596 0-996 0-004 
Vis 78 4-5 5-979 0-7526 0-773 0-227 
14 84 1-5 5-9161 0-2535 0-401 0-599 
\3 90 7°§ 5-8095 1-2910 0-099 0-901 
ig 96 13-5 5: 6568 2+ 3865 0-013 0-987 
ij 102 19-5 5-4544 3-5750 <0-001 >0-999 
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The concept of test power is easily interpretable in the alternative domain of 
representative scoring. The simplest type of test is then one which invokes alternative 
hypotheses specifying the true mean score of each of two normal universes. Consider 
now the following model. We do not know the mean value (M) of the normal variate; 
but we do know that the variance of the score distribution of the parent universe is 
2,500, whence it follows that (o,,”) of the mean of the r-fold sample is 2,500 — r. 
For the 100-fold sample o,, = 5. Our null hypothesis is that M — 18-2. The standard 
score corresponding to a sample value (M,) of the mean is therefore (M, — 18-2) — 5. 
To make « = 0-05, we must make the deviation equal to | -64a,,, 

i.e. (M,, — 18-2) 8-2, 
whence M, = 26:4. 
The alternative hypothesis, which makes 8 = 0-05, is that 
(M, — M) ~ o,, | -64, 
so that (26-4 — M) 5(1-64). Whence the hypothesis is that M — 26-4 + 8-2 

34-6. If our alternative hypothesis were that M — 28-2, the score deviation 
would be (26:4 — 28-2) = —1-8 or —0-360,,. At this level 8 = 0-359. To make 
the two risks equal when the sample size is 100 and the alternative hypothesis is 
M = 28-2, we must choose the sample value (M,) definitive of our rejection criterion, 
so that: 

M,. — 18:2 _(M,, — 28:2) 
5 5 
In this case M, = 23-2, 
i.e. (M, — 18-2) = 5 =4,, 
so that « = 0-159 = 8. To equalize both risks as nearly as possible at the level 
% — 0-05 = B, when the alternative hypothesis is that M — 28-2, we must enlarge 
our sample size (r) so that o,, = 50 = y r in the identity: 


23*2 — 18-2 
1-64 





oO 


mnt 


(23-2 — 28-2) 





Om 


whence we get yr — 16:4, andr — 269 to the nearest integer. 
We may generalize the rules of test prescriptions thus: 
(1) to fix % at the /,o,, level, we make our test criterion 
(t — M,) 








h., so that / ee (vi) 
Om 
(2) to determine f in terms of /,¢,,,, we then have 
(¢ — M,) 
h, ar (vii) 
Om 

(3) to equalize the two risks without changing the size of the sample, we make 
t, M, “(t, M,) (M, > M,) eee 
’ so that £, —-———_— nee (viii) 


oO oO 2 


m m 
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(4) for equal risks, we specify ~ in terms of h,o,, by the relation 








a M, M, M, 
—— —h,, so that h, ——eemeeene nw ws (ix) 
Om 26» 

(5) to equalize risks at the level g — 0-05 = 8 (so that h, = 1-64 h,), we 
must change the size of sample from r, to r, so that the new value of a,, is 
rj .r, *o,, and 

(M, — M.) ry (3-28)*r,07,, 
1-64 =—-, so thatr, = eres (x) 
26,,,\ ry (M, M,) 


The reader will easily adapt the foregoing to the situation in which we require 
two parameters to specify the normal universe of a hypothesis, viz. M, and o,. For 
the simpler case under consideration, the power of the test (1 8) for any alternative 
to the null hypothesis is expressible in the form: 





t— M, 
By nn (x7) 
Om 
We may then write: 
: i 
P(M,) - orm lee (xii) 
V2a h 


We may now make an exploratory table for test design, on the assumption that 
r — 100, o,, — 5, and ¢ — 26-4, so that x = 0-05 when M, — 18-2 (Table II): 





TABLE II 

Value of M Level of Power of | 

definitive of rejection (fh) | Corresponding | test Value of | Value of 

alternative expressed as value of criterion B when r when 

hypothesis hom B (1 — B) a= B8 a = 0-05 = 8 
18-4 1-6 0-945 0-055 0-492 672,400 
19-4 1-4 0-919 0-081 0-452 18,678 
22-4 0-8 0-788 0-212 0-337 1,525 
24-4 0-4 0-655 0-345 0-268 700 
26°4 0-0 0-500 0-500 0-206 400 
28-4 0-4 0-345 0-655 0-154 259 
30-4 0-80 0-212 0-788 0-111 181 
32-4 1-20 0-115 0-885 0-078 133 
34-4 1-60 0-055 0-945 0-053 102 
36-4 2-0 0-023 0-977 0-034 81 
38-4 2-4 0-008 0-992 0-022 66 
40-4 2-8 0-005 0-997 0-013 55 





Since the size of sample fixes the power of the test for a fixed value of «, we can 
set 8 at a predetermined level appropriate to any single chosen alternative of the null 
hypothesis only if we plot P, for different values of r. Table III sufficiently illustrates 
the procedure; and the reader may find it instructive to check the arithmetic by 
recourse to the foregoing equations, 
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TABLE Ill 


POWER FUNCTION (Pf = | — 8) FOR THE SAME MODEL AS IN TABLE I TABULATED SEPARATELY FOR 
DIFFERENT VALUES OF SAMPLE SIZE r WITH THE SAME REJECTION CRITERION (Q% = 0-05) FOR THI 
NULL HyYpPoruesis (M 18-2) WHEN Om = 5 FOR THE 100-FOLD SAMPLE 


At the head of the columns are score values (Mx) corresponding to the condition gq — 0-05 and 
values of Om for the appropriate value of r. 





Size of Sample 


Hypothesis 81 144 256 324 
M cotaeaiiatapesiaiaeeitbiediohesssdlibiataomapiiiaibo aceite ewan 
Om 5: 5 Om 4:16 Om 3- 125 Om 2-6316 
Mx = 27°3 Mx = 25-03 Mx = 23-33 Mx = 22°52 
19 0-056 0-073 0-082 0-090 
21 0-129 0-166 0-227 0-281 
23 0-221 0-312 0-456 0-571 
25 0-341 0-496 0-702 0-826 
27 0-480 0-681 0-879 0-955 
29 0-622 0-829 0-965 0-993 
31 0-749 9-024 0-993 0-999 
33 0-849 0-972 0-999 0-999 
36 0-942 0-996 0-999 0-999 
39 0-983 0-999 0-999 0-999 
4l 0-993 0-999 0-999 0-999 





We are now in a position to see more clearly the implications of approaching the 
interpretation of the outcome of a prophylactic or therapeutic trial as one of accom- 
modating the producer’s risk and the consumer’s risk in the theory of quality control. 
If we do so, we conceive the test procedure as a game of chance in which the investi- 
gator arranges the stakes to accommodate the inclinations of a wholly imaginary 
contestant. His assertion is that Treatment B guarantees 6 per cent. more cures than 
Treatment A, and this fixes the value of 4. His fictitious opponent asserts that 
Treatment B guarantees only a per cent. cures*; but because his opponent is merely 
a figment of his own fears, all that he can say about a is that: a < b. If he conceives 
that his opponent is ready to deny any operational advantage (a — 0) for Treatment B, 
he may set his own risk as equal to that of his opponent at a much lower level for a 
fixed size of sample than will be possible if his opponent makes a far more moderate 
claim (e.g.a = °). Alternatively, the design of a test to equalize risks at one and the 
same level will prescribe the availability of larger samples, if he conceives that his 
opponent, having first denied any advantage, will subsequently concede that there is 
some. 

Having chosen the form of his own assertion (here the numerical value of 4), the 
only exclusive admissible alternative with which he can equip the imaginary disputant 
of his claim is a < 5; but the alternative test procedure then prescribes recourse to an 
infinite sample as a prerequisite to a firm decision. Within the restricted framework 
of conditional inference, the alternative test procedure can thus offer no simple nor 
unique recipe for validifying the operational advantage claimed for a new procedure. 





* To avoid periphrasis, we here use the word “cure” regardless of the criterion of efficacy. 
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In case the reader, finds the foregoing argument obscure, it may help to clarify 
the issue, if we go back to the data of Tables II and III (in whicho,, — Swhenr = 100). 
We first suppose that: 

({) Disputant A initially asserts that M — 18-2 and Disputant B initially 
asserts that M — 26-4: 
(i/) Both disputants initially agree to accept the outcome of a test which 
vindicates the claim of A if the 400-fold sample value of M, exceeds 22:3. 
In that event each takes a 5 per cent. risk of being discredited. We next suppose 
that an arbitrator persuades Disputant A to concede that M — 19, and Disputant 
B to concede that M — 25-6, still taking equal risk on the outcome of a 400-fold 
trial. Disputant A will still lose his case if M > 22-3, but each disputant now 
incurs a 26-8 per cent. risk (% — 0-2676 = 8) of getting an adverse verdict. 

To be sure, we may concede the existence of a situation in which there need be no 
ambiguity about the choice of the appropriate alternative. Thus we may suppose that 
costing considerations exclude the desirability of substituting Treatment B for Treat- 
ment A unless it guarantees at least a per cent. more cures. If the protagonist P of the 
superior claims of Treatment B makes the claim that b exceeds a by a specified 
amount, each assertion is definite, and a verdict is obtainable within the framework 
of conditional inference. However, this situation will arise only if protagonist P has 
the folly to claim more than suffices to achieve any immediate practical advantage 
from so doing. 

Throughout this section we have assumed that the current theory of dual risk in 
quality control operates exclusively within the domain of conditional inference. 
Though Neyman and Pearson (1933) appear to entertain as contrary a view as 
possible, our own is clearly the implication of the exposition of the Sequential Ratio 
Test (Wald, 1947), since the latter must eventually lead to a decision in favour of 
one or other alternative hypothesis, each being false. In this connexion it is 
pertinent to cite the following remarks of the Columbia Research Group (1945): 

... In experiments in pure science, in which risks of errors can hardly be fixed with an eye 
to practical consequences, the setting of risks must rest on such difficult concepts as the a 
priori weight of evidence in favour of the hypothesis in question. 


It is unfortunate that other expositors of the Neyman-Pearson theory of test 
procedure are not equally explicit. Wald (1947), who acknowledges it as the parent 
of sequential analysis, confines himself to the conditional domain, as in the following 
statement (italics inserted): 

For any given critical region W we shall denote the probability of an error of the first kind 
by « and an error of the second kind by 8. The probabilities % and £8 have the following 
important practical interpretation . .. In the long run the proportion of wrong statements will 
be « if Ho is true, and B if H, is true. 

A procedure interpreted in such terms is, of course, consistent with the entirely 
conditional administrative intention involved in the screening of a population with 
reference to tubercular infection. When this is indeed all we ask of a decision 
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test, we may certainly agree with Kendall (1946, p. 272): 

The argument does not depend on the relative frequencies of occurrence of the hypotheses 
Hoy and H,; ... There is no concealed form of Bayes’s postulate in this approach. 

Unlike ourselves and the Columbia Research team cited above, Kendall himself 
does not appear to concede that the prior probabilities of Bayes’s theorem are indeed 
relevant when our concern is with unconditional inference. Introductory remarks to 
the argument mentioned in the last citation, after emphasizing that (1 %) merely 
defines the probability of accepting the null hypothesis when true, continue as 
follows (p. 270): 

but what about the case when it is not true? We cannot ignore this case, for its possible 
existence is the very reason for carrying out the test. 

To avoid overstatement, let us finally emphasize a distinction implicit in this 
discussion and explicity recognized in the use of the epithet pure as applied to science 
in the foregoing comment of the Columbia Research Group. Whether our concern 
is with conditional or unconditional assertion depends less on the subject matter 
than on the end in view. If we invoke statistical procedure to screen statements 
worthy of assimilation in the corpus of scientific knowledge, the only assertions 
relevant to the intention are unconditional in the sense in which we here use the term. 
If we invoke it as a guide for day-to-day decisions in commerce or government, a 
conditional type of assertion is commonly appropriate to our requirements. 
Failure to appreciate the level at which the need for a clear-cut choice arises has been, 
and is, the source of perennial confusion. A still growing literature on statistical 
methods for bioassay bears witness to this confusion, which is resolved when we 
recognize that statistical preoccupations relevant to bioassay as an instrument of 
commercial production or legal inspection are essentially different from those which 
should carry weight in the conduct of physiological research with the advancement 
of knowledge as its goal. By the same token, the main preoccupation of a research 
staff attached to a pharmaceutical firm need not be the same as that of a university 
department, and the statistical tests appropriate to their different aims will then also 
be different. 


4. A SIMPLIFIED MODEL OF CURRENT TEST PROCEDURE 


A simple model will here clarify the implications of carrying out the significance 
test most widely practised in the context of the clinical trial, viz. the 7 (1 d.f.) 
test for the 2 « 2 table on the assumption that there is no treatment difference. If p, 
is the true proportion of cures by Treatment A, and p, that of cures by Treatment B, 
we may speak of k as the operational advantage of the second treatment if p, = p, + k. 
The conventional null hypothesis is that A = 0, and the current procedure is to fix 
the rejection criterion (most commonly so that % = 0-05) with no concern for any 
alternative value that k may have. This restriction of the scope of the analysis is 
worthy of comment for two reasons: 


(1) the presumptive reason for performing any test is that available evidence points 
to a contrary conclusion, viz. k ~ 0; 
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(2) the existence of a difference will be of academic interest alone unless k < do, 
the least operational advantage which would justify the substitution of Treatment B 
for Treatment A in practice. 

As elsewhere, we may simplify our problem by taking a backstage view of the test 
procedure, i.e. we shall assume that we have a reliable figure p, for our yardstick 
procedure, and that p, has the particular value 50 per cent. Also for simplicity, we 
shall assume that the size (r) of each sample is the same. If p,, and p, , are our observed 
sample estimates of p, and p,, their differences being d, = (p,, — P,.) 
M,=(P, — P,) = k is the expected value of d, and the variance of its distribution is 


Pa 4. PI 2P.4a rs k(1 27, k) 
r r r F 


» 





Oy 


When p,, = 4, as we here assume, this becomes: 
1 — 2k? 


2r 


a7” 
When r — 20, the normal curve will give a close fit for the distribution of d,, on the 
assumption that p, lies near }. If p, — $, we may thus define a square normal standard 
score of unit variance (7? for | d.f.) by the ratio 
(d Mj)? 2r(d, — k? 
C2 

, Oo; | —2k? 
When k — 0 we may write c, — cy. This is our null hypothesis (#/)). We shall 
consider the possibility that two alternatives are admissible, 

H,, that k = 0-10 (10 per cent. advantage) 

H,, that k = 0-05 (5 per cent. advantage), labelling c, accordingly. 


‘ 





We then have: 
(10d. -— 1) yr- (20d, — 1)\/r- 
7 ; v 199 
Since our concern is with the possibility that A —- 0, our rejection criterion for 
H, will be that d, > d). If we adopt the convention x — 0-05, we then put cy = 1°64, 
so that 








Co = +4,V/2r; C) 


+1-64 
0 oo 
V2r 
When r — 50, we have then d, — 0-164. This being greater than either M, — 0-1 
(on H,) and M, — 0-05 (on H,), (d, — k) is positive for each admissible alternative, 
so that 

0-64, 50 ; 3-28 50 
Cc; = —— = + 0°65 and Cs —— =~ + 1°14. 
7 v 199 
For these values of the standard score, the table of the normal integral gives 
B = 0:74 when c = +0°65, and B = 0-87 when c = +1:14 











116 LANCELOT HOGBEN AND RAYMOND WRIGHTON 


If our concern is with an overall verdict on the truth or falsehood of the null 
hypothesis, these figures speak for themselves in the light of what we have established 
in Section 2 above. If our concern is merely with the conditional risk of rejecting a good 
and of accepting a bad substitute for Treatment A, we can equalize the two risks by 
setting Cy = —c, OF Cy = —Cyg. For samples of 50, we then have: 

(1) co =~ 0-5 when k = 0-1, so that ~ ~ 0-31 = B; 
(2) cy ~ 0-025 when k = 0-05, so that x ~ 0°40 ~ £. 


In this way we can make a table of conditional risks for different values of r: 





TABLE IV 
~ Value of 8 for g% — 0-05 ~ Value of ~ B 
—— - — — —_ — = — 

on H, on H, on H, on H, 
50 0-74 0-87 0-31 0-40 
100 0-55 0-82 0-24 0-36 
200 0-36 0-74 0-16 0-31 
400 0-12 0-59 0-13 0-24 





5. SUMMARY 

(1) Hitherto it has been customary to assess the claims of therapeutic and prophy- 
lactic measures in statistical terms by recourse to tests which invoke a unique and 
so-called null hypothesis, namely that the procedures compared are equally efficacious. 

(2) This procedure has no bearing on the operational intention of the trial, viz. 
to find out how much advantage accrues from substituting one treatment for another. 

(3) Within its more restricted domain, the credentials of any significance test 
which takes within its scope only one hypothesis have now to meet the criticism 
that it takes into account only one sort of error, viz. that of rejecting the hypothesis 
when it is true. 

(4) A procedure which justifies assertions of so limited and conditional a scope 
may be a useful self-disciplinary convention; but its claims to rank as an instrument 
of statistical inference are no longer acceptable. 

(5) The Neyman-Pearson theory of alternative test procedure leaves open the 
possibility of prescribing: 

(a) rules of unconditional statistical inference, as may be possible when we can specify 
the distribution function of every admissible hypothesis; 

(b) rules of conditional statistical inference, when our legitimate concern is to safeguard 
ourselves against alternative hazards. 

(6) Most commonly, situations in which decision test procedures are consistent 
with the restriction stated under 5(a) arise only when 5(/) is consistent with the 
practical aim of the test, e.g. when we approach problems of costing or screening 
from the viewpoint of the administrator; but the dual test conceived in terms of 5(a) 
may have useful applications in medicine, e.g. in connexion with differential diagnosis. 

(7) In the context of the therapeutic or prophylactic trial, we can never restrict 
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the range of admissible hypotheses to accommodate the possibility of useful uncon- 
ditional assertions. Both the propriety and the practicability of interpreting it en 
rapport with 5(b) are questionable, if the end in view is the advancement of science. 

(8) The outcome of our critique is therefore a recognition of the need for a new 
approach to the validification of such trials by methods of comparative estimation. 
These will be the theme of a subsequent communication. 


ADDENDUM. When this communication went to press, the authors were not aware 
of the views on test procedure expressed in two important contributions by Jackson 
(1936) and von Mises (1943). Jackson introduces the concept of fest stringency, a 
test being most stringent if it assigns a minimal unconditional uncertainty safeguard 
in our sense of the term. Von Mises uses the expression error chance for what we 
call the uncertainty safeguard and success rate for what we denote by P,, for which 
the expression stochastic credibility might be preferable in the common domain of 
test procedure and estimation. 
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Present-day society urgently needs reliable measures, both local and national, 
of the health of the people. Declines in death rates undoubtedly indicate great 
improvements in public health in terms of survival, but do not necessarily represent 
equal declines in the numbers of the sick and the magnitude of their needs. More 
precise knowledge is needed of ailments and diseases in specific communities, as a 
factual basis to administer present services for the sick, to interpret local trends of 
morbidity, to enlarge the content of epidemiology and translate its findings into 
preventive measures, and to plan new services in changing conditions or newly- 
developing areas. 

But this knowledge must be specified in precise terms. ‘‘Total’’ knowledge of a 
community is an abstract concept, in reality unattainable. Medicine does not deal 
with the total person, nor does public health administer the total community, but each 
selects certain features which to the doctor or administrator seem important. 

In social medicine we must consider it important to seek knowledge concerning 
the preventable ill-health of communities in order that preventive action may be 
taken. Since prevention and the measurement of the preventable are aspects of a single 
process, and since the normal agencies of prevention are local, the laboratory of 
social medicine is the local community, and its interest is focussed on the social 
group rather than on individuals. The life and circumstances of the small community 
or “neighbourhood” are dominant influences on the health of the individual in an 
urban society. This small definable community we take to be the unit of research: 
and we judge as important those environmental factors which can foster or impair 
the life and well-being of the social group. We therefore select for investigation 
those aspects of social and health experience which relate particularly to the pro- 
motion of the health of the group in its own local environment. 


ORIGIN OF THIS STUDY 


No-one concerned with the health of the Scottish people can ignore the problem of 
respiratory tuberculosis in Scotland. It is generally accepted that tuberculosis, commonly 
referred to as a “social disease”, has social causes and consequences; yet these are rarely 
investigated with the precision essential for firm conclusions from which preventive action 
can spring. 

These social factors cannot be considered in isolation from the communal life of the whole 
group whose location and daily activities hold it together. The tuberculous are not a pecu- 
liar category in the population, to be found only in case-histories or in mortality records, 
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Tuberculosis is not merely another disease with certain clinical implications. It is a part of 
the ill-health of the community, reducing working capacity and disrupting family life. But 
statistical analyses or studies of records seem to stop just at the point at which the day-to-day 
problems of the disease in relation to the whole community begin: such analyses are neces- 
sarily limited by recorded data. On the other hand, the follow-up of individual case-histories 
or the comparison of two systems of therapy does not yield knowledge which can be trans- 
lated into preventive measures for the whole group. The necessary knowledge can only 
come from planned investigation of the social and medical circumstances of the whole 
group in which disease develops. 

Moreover, investigations of sickness frequently have limitations other than those imposed 
upon them by records. Sickness is calculated in terms of hospital-treated persons or in 
averages found from claims for sickness benefit, or from interpretations of answers in sample 
surveys of the lay population. Each type of study commonly seems to indicate that other 
techniques give a different and therefore presumably less accurate picture. These differences 
are inherent in the data themselves: one set of records, collected for one specific purpose, 
naturally excludes any sickness which does not come into that category. Any single technique 
of enquiry yields data which, if isolated from the context in which they have meaning, give 
a one-sided description of group sickness experience. 

This particular project of socio-medical research, arising from the problems of “social 
disease” and especially of tuberculosis, was influenced by the apparent limitations of aim 
and method of previous investigations. Thus our emphasis was placed on features commonly 
excluded, on group life and environment, on social and health experience, and on the cir- 
cumstances and habits which might foster the spread of disease in a community in which 
infection isendemic. These aspects were selected according to their relevance to the incidence, 
spread, or control of disease in the community. 

The group or groups studied were to be socially definable units. The whole group in its 
own environment was to be the object of investigation, not merely the categories of “diseased” 
against “controls”. Moreover, the study was not to be based on one approach alone. If 
the sickness experience of the group were to be discovered, illness treated in hospital or 
sickness recorded in claims could not be separated from ailments felt and expressed by the 
members of the community. 


Opsectives OF THis STUDY 
This investigation of a particular urban community within a large city was planned as a 
pre-pilot study for a long-term socio-medical research project. The group selected was one 
that had been re-housed some 16 years, having been removed from an overcrowded neighbour- 
ing area to a newly-built housing estate on the fringe of the city. The present population 
largely consists of members of the original families, and is generally recognized—both by its 
own members and by other people in all parts of the city—as a definable social unit. 


The aspects of group life and conditions which could be studied in this pre-pilot enquiry 
had to be limited. The four main aspects of study selected as relevant to health were: 
(i) housing occupancy, (iii) food habits 
(ii) general health experience, (iv) leisure pursuits. 
The last two are often publicly stated to be associated with disease, though evidence for such statements 
is not usually forthcoming. 


The information obtained by various techniques of investigation fell into three categories: 
(a) the group's own description of its experience and habits; 
(b) the objective description of the group’s experience as recorded by various authorities and 
agencies ; 
(c) the outsider’s description of the group and its characteristics as seen by the external world. 
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Main Objectives Related to Appraisal of Hypotheses and Techniques: 

(1) To confirm the opinion that the combination of the three methods of obtaining data 

household survey, analyses of all records, external information—was essential to secure a 

full picture of the group’s experience, and that this combination could in practice be realized. 

(2) To examine whether the aspects selected as relevant to the spread of disease in a small 

community could be usefully investigated by this combination of methods, and whether the 
findings could be collated. 

(3) To give the research team experience both of the techniques in application, and of the 

answers to the questions, the data kept in records, and the judgements of outside informers. 


Although this investigation was to be a preliminary to a long-term tuberculosis research 
project, it was considered necessary that it should in certain respects stand by itself as a 
socio-medical inquiry. Tuberculosis as a social disease was not expected to involve a different 
set of social factors from those of other social diseases. Specific objectives were therefore 
put forward in addition to the general assessment of techniques in application and of factors 
to be investigated: 

I. To obtain sound basic data for the group—for example, population data, family and 
household structure, economic and occupational circumstances— which could be reliably used 
in other fields or in a follow-up of this investigation. 

Il. To investigate the type of bias and limitation attached to different methods of measur- 
ing sickness. Single-source data such as those of sample surveys, hospital-treated sickness, 
sickness benefit claims, and infectious-disease notifications, could be compared, and the 
discrepancies revealed between the single pictures could perhaps be measured. 

It was hoped to maintain the relations to be established with this community in the future, 
to visit this community again, and to lay the basis of regular assessment of sickness and social 
experience at different seasons over a period of years. 


METHODS 


The value of the methods and the reliability of the results depended on a care- 
fully-taken random sample which represented the whole group. The size of the 
sample decided on was 10 per cent. of the total households in the area. A random 
sample was drawn, and where possible the sample was checked against the whole 
community as being representative of size of house, location of flat, type of household 
and so on. 

The information wanted had to be defined and detailed, since it was to be obtained 
from three types of source: 

(i) household interviewing and individual questioning, 

(ii) investigations of all available records of persons or families in the sample, 

(iii) statements made by outside persons who stand in some relation of authority, responsibility, 
or service to the group. 

(1) HoUsEHOLD ENQuiry.—The subjects of enquiry were limited to: 

(a) House: occupancy and usage, washing and sleeping arrangements, etc., 

(b) Food: patterns of meals and feeding habits, cost and frequency of shopping, types of food 
bought and ready-cooked foods eaten, mid-day meal taken, food preferences, etc., 

(c) General Health: sickness experience, childhood ailments, use of patent medicines, ex- 
perience of chronic ailments, illnesses in current month, duration and disability caused, 
major illnesses in previous 12 months, 

(d) Leisure Habits: activities engaged in regularly or occasionally, weeknight and weekend 
habits, frequency of young people “‘going out” in evenings, travelling involved. 

The household and individual questionnaires dealt with these four aspects of daily life. 
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The questions themselves were of three kinds: 
' (i) matters of fact, 
(ii) matters of opinion, 
(iii) matters relating to attitudes and habits. 

Special emphasis was placed on the household unit, and the schedule for the housewife 
included material for the whole household as well as questions about herself. She was asked 
about the house and its usage, about shopping and meal habits, about family history, and 
about the children’s health. The adults were questioned about leisure habits, circumstances 
of travelling and of eating at work or elsewhere, and recent health experience. 


(2) INVESTIGATIONS OF RECORDED Data.—Apart from the four topics (a)(d), economic 
status and social difficulties were also studied. Employment and occupational experiences 
were examined, and financial assistance to household earnings from national agencies or 
from pensions and benefits was noted. Sickness data were collected from the records of 
doctors, hospitals, schools, and public health authorities. Social disharmonies, such as 
minor absenteeism from school, frequent changes of jobs by juveniles, delinquency and petty 
offences, were investigated, and the social difficulties expressed through appeals to welfare 
organizations and guidance councils were noted. 


(3) OPINIONS AND ATTITUDES OF OFFICIALS AND OUTSIDE OBSERVERS.—These informants 
were approached for their knowledge and opinions of the main characteristics of the group, 
and were only prompted or questioned further if their statements needed clarification. Their 
opinions on aspects of group life other than those of particular interest in this enquiry were 
noted but not pursued. 


All the data thus obtained were analysed to give separate tables and distributions 
of population, households, sickness, unemployment, assistance, social difficulties, 
offences, and the like. The matezial was collated and combined into a composite 
description of the group. In this paper the material presented is limited to that 
relating to the demographic aspects and sickness experience, since these are basic 
to the whole study. Other aspects concerning the socio-economic circumstances of 
the group and the more detailed data for particular categories such as children will 
be discussed in a subsequent paper. 


THE COMMUNITY STUDIED 

The community from which the sample was drawn comprises nearly 2,000 house- 
holds living on a Local Authority housing estate built in the mid-1930s. The streets 
within the estate have no buildings other than the tenements with their four or six 
houses “on a stair’. The main road running along the city side of the estate contains 
some shops, two primary schools, churches, a cinema, and some houses. The other 
three boundary roads are bus routes and consist almost entirely of houses; one has 
a few shops, and another the third primary school of the area. The nearest secondary 
school lies a short distance from the boundary. The nearest places of work are also 
beyond the boundary, but there are several very large concerns close by. There is no 
community centre on the estate and no provision for communal activities for adults 
or young people except for a boys’ club hall. Transport to the city is freely available 
by two bus routes skirting the estate and by tram routes within walking distance. 
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All the houses contain a living room whose open fire maintains the hot-water 
system, a small kitchenette with a gas cooker, a bathroom, a medium-sized bedroom, 
and one or two extremely small additional bedrooms. The common stair in these 
tenements gives access both to the street and to the common “back green”’. 

The families which originally populated the estate nearly all came from an ad- 
jacent densely-crowded area, whose inhabitants are employed mainly in local concerns, 
and where there are numerous shops and many social facilities. The rehoused 
families were mainly large and growing ones, which were overcrowded in their 
existing houses (and rehoused on that account); and the parents were mainly young 
couples whose families were not yet complete. 

Changes in the population have taken place as a few of the original families 
have moved from the estate. Their places were filled by other families placed on the 
housing waiting list because of overcrowding or some other priority. Other changes 
occurred as children grew up and married, or because of the war. Many of the men 
returned from the Services do not follow the same occupation as before the war, but 
on the whole people still work within reasonable distance of their homes, and in our 
sample nearly all the workers came home for midday dinner. 

A great many of the families on the estate have become related by inter-marriage; 
in the sample, groups of five, six, or more families were inter-related by the marriages 
of several children. The bulk of the population consists of the remaining members 
of the original families, and a large proportion of these contain married daughters 
or sons with their young husbands or wives. The population is thus a comparatively 
stable one, and in many cases families move only from one part of the estate to 
another. In the sample of 198 households the length of residence of the parent 
families in their present house is shown in Table I. 


TABLE | 
LENGTH OF RESIDENCE OF PARENT FAMILIES IN 10 PER CENT. SAMPLE, APRIL, 1951 





Year of Moving In .. 1935-38 1939-40 1941-42 1943-44 1945-46 1947-48 1949-5] 


Duration of residence (years) 14-17 12-13 10-11 7-9 


Percentage of Main Households 42 25 8 7 3 53 73 


5-6 3-4 2 or less 





Nearly half the sample families were among the earliest settlers, and another 
quarter moved in during the years in which the estate was being completed in the 
middle 1930s. Many of the original habits of this population are still retained. 
Much of the shopping is done in the neighbouring parent area, visits to relatives in 
the parent area are constant, and some of the inter-marriages are between relatives 
in the parent area and young people on the estate. Leisure activities also centre 
largely in the parent area or in the central city area. __ 

The sample group was taken by a random selection of houses, and all persons 
in each sample household were studied. Of the 198 houses, 56 contained more than 
one family, most of them being related in-laws. Table II (opposite) shows that the 
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TABLE II 
S1zE OF HOUSEHOLDS, APRIL, 1951 


N 
Ww 














Number of Occupants ne ' 2 3 4 5 6 7 8 9 10 11 12° All 
Number of Households of 2 20 25 46 38 24 «2!i1 Zz 6 l 2 | 
this Size... ce a — SS eo —_—— — 198 
47 84 45 24 
Number of Persons included 2 40 75 184 190 | 144 147 96 54 10 22 12 
in Households of this Size —_——_— = =" . 976 
117 374 291 194 





distribution of households by numbers of occupants centres in the household of 
four or five persons, but very many households contain six, seven, Or more persons. 
Thus 50 per cent. of the sample population of 976 persons live six or more per house, 
and 20 per cent. live eight or more per house. 

Table III shows the composition of this population, by age and sex; the distri- 
bution is bi-modal rising to one peak in the 15 to 19-year age group and to another 
for the 40 to 49-year age group. The deficits among the 16 to 24-year-old men are 
explicable, because we did not count young men who were members of the households 
but were away on national service and, therefore, absent when the survey was con- 
ducted. The deficit in the 35 to 44-year-old men can be partly associated with the 


TABLE III 
PERSONS IN 10 PER CENT. SAMPLE (198 HOUSEHOLDS), BY AGE AND SEX, APRIL, 1951 





Male Female Both Sexes 
Age Group — - vn - 
No. Per cent. No. Per cent. No. Per cent. 

0 me Mt 4 -8 5 1-0 9 9 
j- 4 te ap 44 9-2 29 5-8 73 7°5 
5 9 st ae 39 8-2 40 8-0 79 8-1 
10-14 ie a 64 13-4 65 13-0 129 13-2 

15 bed oe 14 2:9 20 4-0 34 3-5 
16-19 ee oe 44 9-2 59 11-8 103 10-6 
20-24 oa - 42 8-8 46 9-2 88 9-0 
25-29 me ji 43 9-0 32 6°4 75 7-7 
30-34 ss sas 21 4-4 20 4-0 41 4-2 
35-39 he a 18 3°8 26 5:2 44 4-5 
40-44 a Ss 29 6:1 42 8-4 71 7:3 
45-49 se eo 32 6:7 35 7:0 67 6-9 
50-54 vs as 30 6:3 27 5:4 57 5-8 
55-59 ms <i 17 3-6 20 4-0 37 3-8 
60-64 3 a 14 2:9 20 4-0 34 3-5 
65-69 a ai 11 2:3 4 -8 15 -5 
70-74 5 i 9 1-9 8 1-6 17 ‘7 
75-79 om v I -2 | -2 z -2 
80 - | -2 l | 


Total ..  .. | 47% = 100 (sss 100 976 100 
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losses of the last war. The deficit in young children is not so easily explained until 
the family composition of the sample is examined. 

Table IV shows the numbers of households containing, in addition to the main 


TABLE IV 
PARTICULARS OF 198 HOUSEHOLDS IN 10 PER CENT. SAMPLE, APRIL, 1951 











Total 
Without s sub-groupings* .. 119 
(i) Both Parents .. oa 143 
With 1 or 2 in- law groupings 24 
Composition of }—=£———_—_— . - 
Main Families and Without sub- “groupings ee 22 
Others in - 
Household (ii) Widowed or Separated With 1 or 2 in-law groupings 26 
Parent Ae ae 55 
With unrelated sub-tenant 
family ” os - 7 
Age of Husband or Head of 20—- 25-— 30— 35- 40— 45- 50- 55- 60- 65- Total 
House hold} te 24 29 34 #39 #44 #49 54 59 64 69 70 All Ages 
Couples... oS 4 5\;13 | RBisi sb| & 7 4 143 
Main Widowersand Separ- 
ated Men | - - - 3 2 2 3 4 14 
W idows or Separated 
Wives... oS 2 1 | 13 4 5 7 I 8 41 
Couples... a 6 13 10 3 l 33 
Widowed or " Separ- 
ated Persons with 
In-Laws Children .. oo 3 4 I I 9 
Widowed or “Separ- 
ated Persons... 2 | - I l I 2 4 11 
Cc ouples Pe io |= 4 3 7 
Lodgers — —_______—— ——|— ~ - 
Persons - oo 1 - I l 3 





Number of Unmarried Children re- 
maining in Main and Sub- b-Families 0 I 2 3 4 5 6 7 8 9 Total 


Couples ¥ e i | oe ae et ae | Oe ee 9 3 l 2 143 
Main —____— — |— —|— : — 
Widowed or Separated 
Persons -~ Ses 22; 10; 10 | 10 l I I 55 
Couples e -- | 10| 16 7 33 
In-Laws $$ _ _ 
W idowed or Separated 
Persons ne 11 3 3 3 - 20 
Sub- Tenant Couples si WF | 5 | 7 





* Sub-groupings within the main households are defined as married couples or widowed or separated persons who 
occupy a room (with their children, if any), and have separate arrangements for meals and other domestic affairs. 
The Head of the Household is taken to be the male tenant responsible for the house. In some households there is an 
older person (father-in-law or mother-in-law) who is not the head. In households in which there is no husband, the widow or 
the married separated wife is counted as the head of the household, 














group, while those of the in-law couples centre in the 25 to 29-year group. 
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family, one or more young families still living in the parent household. The ages of 
the parent couples or widowed householders are distributed about the 45 to 49-year 
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In addition to the 49 households in which a young married couple resides in the 
parent household, there are seven households containing two additional young 
families, most of them young couples awaiting a house of their own. These circum- 
stances may explain the deficit of young children in the population of the sample. 


There are ten young couples with no children and 2 


2 with only one. 











TABLE V 
COMPOSITION OF 198 HOUSEHOLDS IN 10 PER CENT. SAMPLE, APRIL 195] 
(A) MAIN FAMILY WITH NO SUB-GROUPINGS: i.e. Parents or Head of Household and Total 
unmarried children (if any) 
Couple With unmarried children 106 
Without children remaining 13 
: = . 141 
Male Householder With unmarried children 5 
Without children remaining 0 
Female Householder With unmarried children 15 
Without children remaining 2 
(B) MAIN FAMILY WITH ONE SUB-GROUPING 
(B.1) With In-laws, 
Couple With unmarried children and — In-laws with children 7 
With unmarried children and In-laws without children 13 
Without unmarried children but with In-laws with children 2 
Without unmarried children but with In-laws without children | 
Male With unmarried children and In-laws with children | 45 
Householder With unmarried children and In-laws without children 0 
Without unmarried children but with In-laws with children 4 
Without unmarried children but with In-laws without children 2 
Female With unmarried children and In-laws with children 7 
Householder With unmarried children and In-laws without children | 
Without unmarried children but with In-laws with children 6 
Without unmarried children but with In-laws without children | 
(B.11) With Lodgers ; ; 
Male Without unmarried children but with Lodgers with children l 
Householder Without unmarried children but with Lodgers without children l 5 
Female With unmarried children and _ Lodgers with children 2 
Householder Without unmarried children but with Lodgers with children l 
(C) MAIN FAMILY WITH TWO SUB-GROUPINGS 
Couple With unmarried children and 2 In-law couples, both with children l 
Male Without unmarried children, but with Lodgers with children and 
Householder another Lodger = a ae ae , l 
Female With unmarried children — sub-groupings with children 3 7 
Householder Without unmarried children but with 2 sub-groupings, one with- 
out children oe oa “fe ; oo e 2 
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There are, however, a large number of “‘three-generation” households, comprising 
the main couple, a married son or daughter, and a grandchild or grandchildren. 
The composition of the 198 households is shown in Table V (previous page) which 
also includes the few families containing lodger couples or families. Sub-letting is 
not officially permitted, but in exceptional cases where an elderly person is left as 
sole occupant by the death of the spouse and the house is therefore too large, the 
authorities are not adamant about lodgers. The number of lodgers is extremely small, 
and in all except one case the families lived almost as if the lodgers were relations. The 
sharing of a kitchen, certain shopping and cooking arrangements, and some of the 
daytime activities were those of one household just as they were in those families 
where the young couple was closely related. The evening meal and other private 
arrangements were also similar for lodgers and for young in-laws. 

It can be seen that the three-generation households constitute over one-sixth of 
all households in the sample; if the lodging families are included in three-generation 
households (as in fact their habits show that they should be) the proportion is one-fifth. 

Thus this sample represents a population containing a large proportion of elderly 

TABLE VI 
AGE, SEX, AND CiviL STATUS OF POPULATION IN 10 PER CENT. SAMPLE, APRIL 1951 





Male Female Both Sexes 
Age Group 
Single Married* Widowed Single Married* Widowed Single Married* Widowed 


09 87 5 a 74 . . 161 


10-14 64 2, 7 65 - 7 129 
15-19 58 7 . 79 p z 137 a 
20-24 37 5 " 30 16 2 67 3 
2529 20 23 9 23 i 29 46 
30-34 4 17 - 4 16 . 8 33 
15-34 119 45 . 122 55 * 241 100 
35-39 2 16 . 24 | 3 40 
40-44 : 29 ss 40 69 
45-49 31 . : 25 10 56 10 
50-54 2 25 3 nm 23 4 2 48 7 
35-54 5 101 3 2 112 16 7 213 19 
55-59 2 15 2 % 15 5 30 7 
60-64 < 12 2 ee 13 7 25 ) 
65-69 8 3 2 10 4 
10-74 a 5 4 8 5 12 
55-74 | .. 40 1 30 21 | 10 32 
75 ie “ I " 7 2 . oF 3 
Totals 275 186 15 264 197 39 539 383 54 





* Married” includes those living apart from the spouse. 
P 














THE HEALTH OF AN URBAN COMMUNITY 127 


couples, whose older children have married and moved away, whilst their younger 
children, married or unmarried, are still living with them, and a few grandchildren 
are growing up in the families. The population pyramid illustrates clearly the ages 
in which the sample lacks the normal complement of young adults. This is equally 
illustrated by the age-sex-civil state composition detailed in Table VI (previous page). 
The numbers of single or married are sharply differentiated by age; there are only 
seven single women and nine single men over the age of 30. 

On the other hand, sixteen of the 39 widowed women are under the age of 55, 
and most of these have children to support. The 197 married women include thirteen 
living apart from their husbands. Only a very few of these separated wives are young 
women with no children; the majority have several children, and one has eight still 
with her and others grown up and gone away. The majority of these separated wives 
are main householders with all the associated responsibilities; only a few of them 
live with their children in the parent household. The cases of the two separated hus- 
bands with young children are not due to marital difficulties but to long-term 
hospitalization of the wives. 

The composition of the child population, shown in Table VII, is of some interest. 
Even in this sample the uneven numbers of births in the last 15 years and the sharp 
decreases and increases during the war and immediate post-war years are evident. 

The general decline in births is probably as strongly associated with the house- 
hold and family composition as with the declining birthrate in the whole country. 
The average annual birthrate in the sample during the past 18 months is 13-3, the 
most recent city birthrate being 15-7 per 1,000 estimated population. 


TABLE VII 
YOUNG PERSONS IN HOUSEHOLDS OF 10 PER CENT. SAMPLE BY SEX AND AGE, APRIL 1951 

















Age in Years Male Female Both Sexes 
0 4 ) 5 9 ) 
) 6 is | 
0O- 4 2 8 48 5 34 13 82 
3 10 7 17 
4 17 11 28 
5 7 7 14) 
6 6 5 11 
5- 9 7 13 39 9 40 22 79 
8 y) 10 f 19 
9 4 9 | 13 
10 8 ) ‘6 | y. 
1 8 | 7 15 
10-14 12 19 64 15 65 34 a 129 
13 11 13 24 
14 18 13 J 31 ] 
15 14 20 ) 34 ) 
16 18 17 | 35 | 
15-19 17 12 58 Il } 79 23 > = «#137 
18 i 8 | 19 | 
19 3* | 23 J 2 | 
Total " 209 218 427 
* Boys away on national service and not actually living in the house at the time of the survey were not counted in the 


sample, 
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The uneven numbers of births in years when external social or political events 
sharply affected family life indicate other problems superimposed upon that of a 
declining birthrate. The provision of services and facilities for children is difficult 
when numbers fluctuate so much. 

Many other social characteristics of the sample are linked to a greater or lesser 
degree with the demographic structure of the population in the area. The problems 
of children growing up in three-generation households are intensified where the 
young mother is separated from her husband; for separated women who are main 
householders, the difficulties of running a home, earning a living, and caring for 
the children are even greater. 

For the community as a whole, the present composition of the population has 
more serious results. It has been shown that the present structure comes from two 
distributions of different age-spread superimposed upon one another: the majority 
of the population is that of the original families whose members have grown old, 
while the smaller distribution at the younger ages is that of the children who have 
married and settled with their partners in the parent house. Such a population, with 
its deficits of persons in their thirties and its excess of older couples, would have 
problems even with no housing shortage. In addition, the housing shortage may well 
be associated with the restriction of the family size of the young couples awaiting 
a house of their own, and may lead to an even more unbalanced population structure. 
These two distributions of persons in the population typify the two major problems 
of such a re-housed community—the many elderly couples, and the overcrowding 
which ensues as the family expands in houses which do not expand. 

The importance of a population composition of this kind extends far beyond 
this estate: it seems inevitable that the building of housing estates and the policy of 
populating them with families whose present needs are great will result in similar 
populations of similar composition as time goes on. As the population on an estate 
ages still more, other wide-spread problems must arise. Meanwhile, the limitation of 
family size and the continuous overcrowding of parent households seems likely to 
continue while the serious housing shortage continues. 


HEALTH AND SICKNESS IN THE COMMUNITY 


Within the estate, the usual local and national health and social agencies operate, 
and services of many kinds are available. The different Health Visitors concern them- 
selves with the welfare of infants, school children, and certain sick people, and there 
is a Child Welfare Clinic in the centre of the estate. Two District Nurses live on the 
edge of the estate, and various social welfare organizations have workers in close 
touch with the community. 

A number of doctors are in contact with the estate, but most of the families are 
within the practices of one group of doctors who have served this community since 
its inception. Over 80 per cent. of the population in the sample were found to be their 
patients; the composition of this portion of the sample is shown in Table VIII. 
Table VIII (and Fig. 1, overleaf) show that this “practice population” is composed 
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TABLE VIII 
SAMPLE POPULATION IN PRACTICES OF MAIN Doctors, BY AGE AND SEX, APRIL, 1951 
Male Female Both Sexes 
Age Group —$__  —__ —___—_— — — 
No. Per cent. No. Per cent. No. Per cent. 

0 as so 2 °5 5 1-2 7 “9 
1- 4 2 1 42 10-6 24 5-8 66 8-2 
5 9 _ ne 31 7°8 32 7°8 63 7°8 
10-14 = _ 56 14-1 54 13-1 110 13-6 
15 Ms e 11 2:8 17 4-1 28 3°5 
16-19 = ‘es 34 8-6 49 11-9 83 10-3 
20-24 Fe ne 38 9-6 43 10:4 81 10-0 
25-29 mi PM 35 8-8 27 6:6 62 7:7 
30-34 me ee 21 5-3 17 4:1 38 4-7 
35-39 - _ 15 3-8 18 4:4 33 4-1 
40-44 — a 22 5:6 37 9-0 59 7°3 
45-49 2 eit 25 6°3 29 7-0 54 6:7 
50-54 pie oo 26 6:6 20 4-9 46 5-7 
55-59 ae 15 3-8 16 3-9 31 3-8 
60-64 ‘ca re 8 2:0 16 3-9 24 3-0 
65-69 ns ee 9 2°3 2 0-5 11 1-4 
70-74 = _ 5 1:3 4 1-0 ) 1-1 
75-79 in Fi | 0:3 | 0:2 2 0:2 
80 l 0-2 | 0-1 

Total in mt 396 100 412 100 808 100 





of people of all ages whose distribution is almost identical with that of the full sample. 

The sickness data assembled in this study were obtained from the different sources 
outlined above, from the doctors, and from interviews and questions in all the 
houses in the sample. The information from national and local authorities was 
abstracted in different ways according to the form and content of the records. 
Much of the basic background was provided by health agencies: but other agencies, 
whose main concern was with economic or social assistance, also provided much 
information on special aspects of sickness or disability recognized as contributing 
to the family need. But since each agency records only certain information relevant 
to that aspect of sickness for which it has to make provision, the picture obtained 
from records of outside agencies is not at all uniform. Thus, infectious diseases 
notifications to the Public Health Department are limited mainly to those of children’s 
notifiable diseases, and these again mainly refer to the first case in the household if 
the child be under 5 years of age. Hospital records are concerned only with those 
persons brought to their notice, so that for example an older child brought to 
hospital because of an infectious disease will appear in hospital records. But infectious 
diseases of older children treated at home do not appear in any records save perhaps 
those of doctors. Similarly, other records include only a selected portion of the 
population actually sick: and the basis for selection is not necessarily a medical one. 
Claims for sickness benefit, for instance, do not include any sickness lasting less than 
three days. It appears that official records inevitably classify an illness as “*new” at 
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the stage when it is first brought to official notice. So each set of records describes 
not illness as it occurs but illness by the definition of what is conspicuous and is 
brought to the notice of the particular agency. 


AGE IN APRIL, 1951 


































































































85 
=—= PERSONS IN 10 PER CENT. SAMPLE OF 
80 ESTATE 
— PERSONS IN SAMPLE IN PRACTICES OF 
MAIN DOCTORS 
75 #=<2 PERSONS IN SAMPLE WHO HAVE HAD 
SICKNESS BENEFIT, 1948-51 
70 | 
65}, ss 
‘ 
‘ 
‘ 
60)". 
t 
a 
‘ 
55| 4. 
® 
‘ 
‘ 
50) 4. 
’ 
° 
: 
45 =! 
a 
r 
‘ 
a 
a 6 
1 ‘ 
‘ 8 
8 135) of 
J ’ 
. ‘ 
\ ‘ 
: ‘ 
Poeceeee 30 t.- 
‘ ‘ 
' ry 
= * Sor 
: | 
a ' 
J 
feoan 20 “s 
t ‘ 
‘ ‘ 
teaeacd peseaemaeaecal 
‘een 15 E 
| 
5 | | 
70 60 80 40 30 2 t0 +O 2 30 40 S50 60 70 80 
MALES NUMBER IN AGE GROUP FEMALES 


Fig. 1.—Persons attended by main doctors and drawing sick benefit, 1948—S1. 
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The sample survey questionnaire, investigating aspects of sickness additional 
to those concerning the official agencies, included one group of questions on sickness 
felt, disability caused, and common or recurrent ailments, and another on children’s 
health, opinions of family health, habits relating to the maintenance of health, and 
the buying of patent medicines. The questions on family matters were addressed to 
the housewife, and individuals were asked about their own sickness experience. 

A body of valuable material not usually obtainable was made available through the 
interest and co-operation of the main doctors in practice in the area. Through their 
records, details of the hospital-referrals and of some other recorded sickness in the 
“practice population” were obtained, and the doctors also made a detailed statement 
of all the common, recurrent, and chronic illnesses or complaints in all the households 
known to them through frequent contact. These full data, together with all the data 
from other sources, have provided an exceptionally comprehensive description of 
sickness from several viewpoints. Since the “‘practice population” forms such a very 
large proportion of the sample and represents its distribution so closely, this many- 
sided description is considered to be indicative of the sickness in the whole community. 

It is evident that each set of data, if taken alone, would present a very different 
picture not only of sickness in the community but also of the prevalence of specific 
diseases. However, most of the sets of data have a certain “overlap” which contains 
persons who come to more than one agency for some kind of aid. From this overlap 
it became apparent that different records were labelling different stages of disease 
or even different aspects of sickness by the same name. Some system of uniform 
terminology was essential, and from our standpoint this could only begin with the 
persons who first suffered ill-health; this initial stage in ill-health was to be discovered 
by the survey questionnaire. The next stage in ill-health was that in which the patient 
consulted the doctor for advice and treatment. A further stage, in which diagnosis 
was obscure or illness serious, led to hospital referral: later stages were hospitalization, 
and, possibly, follow-up. At a variable point in this sequence, claims for benefit 
would be made by insured persons unable to continue working. 

In collating the different records to build up this sequential picture of ill-health 
in the community, the various limited aspects of sickness, taken as discrete by separ- 
ate agencies because their medical or social involvement was strictly limited, had to be 
carefully studied and defined. The “area of overlap” made it possible to arrive at 
some comparable standards for the different records, to combine them, and to obtain 
a composite record of “sickness officially recognized”. The recognition of such 
sickness is based on criteria relating to available provision for the relief of urgent 
need—which may be financial or social. This is not similar to sickness recognized 
by the doctor, whose criterion is in the main that of conspicuous suffering, either 
traceable to an external cause or subjectively felt. It also differs from that recognized 
by the family of the sick person, for the family criteria are mainly non-medical: 
conspicuousness of unusual disability, disturbance to domestic routine, sympathetic 
attitude hoped for from the doctor, and difficulties involved at work or school in 
claiming disability. 
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The most serious stage of illness is to be expected in the hospital-treated popu- 
lation. In the “practice population” households, the persons referred from 1948 
onwards to hospital or public health services for observation and possible treatment 
constituted exactly 25 per cent. Table IX shows the age and sex distribution of this 
group and compares other records for the same “‘practice population”; and Fig. 2 
illustrates these distributions. It can be seen that the numbers of hospital-referred 
men and women are very similar (23-7 and 26-2 per cent. respectively). In each age 
group there is little difference visible between males and females, but at the ages 
10-14 and 15-19 the referrals seem to be high. In the older age groups the hospital 
referrals form a considerable proportion of the population at risk. Though the 
numbers are too small to admit of statistical interpretation, the greater hospital 
provision likely to be needed by an ageing population is clearly indicated. 

TABLE IX 
RECORDED SICKNESS AMONG “PRACTICE POPULATION” 1948-51, By SEX AND AGE 
Population in Practice (P) Additional Doctors’ Records (D) 
Hospital Referrals (H) Claims for Sickness Benefit (C) 











Males Females Both Sexes 
Age Group Pinivoicieplulnoicipininoic 

0-4 44 «8 «414 2| 4| 11 ..| 73| 12! 45 

5- 9 31 4 4 32 8 3 63 12 a 
10-14 56 1] e 54 7 3 110 18 3 

' § 11 l l 17 3 l 28 4 z 

16-19 34 9 8 49 9 3 18 83 18 3 26 
20-24 38 5 2 14 43 7 2 18 81 12 4 32 
25-29 35 7 2 17 | 8 te 6 62 15 2 23 
30-34 21 3 l 7 17 3 I 2\ 38 6 2 9 
35-39 15 5 6 18 7 2 33 12 8 
40-44 22 7 3 12 37 19 3 5 59 26 6 17 
45-49 25 15 17 29 12 2 7 54 27 2 24 
50-54 26 5 2 13 20 7 3 4 46 12 5 17 
$5-59 15 4 I 12 16 7 | 2 31 11 2 14 
60-64 8 6 4 16 6 l 24 12 5 
65-69 9 l 5 2 l 11 2 3 
70-74 ; 3 «| # 9| 3 
75 I ? z 3 

Total Practice Popula- : ; ; 
tion : me 396 94 30 115 412 = 108 23 65 808 202 53 180 
Percentage 23-7 7-6 26-2 5-6 25-0 6:6 
Total Practice Pop- 
ulation of Working 
Age.. is ag -. ae “% 1 ire | ss - 65 | 530 .. .. 180 


Percentage ie ee a .. |44°6! .. oy a 23-9 a we us 34-0 
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AGE IN APRIL, 1951 
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Fig. 2.—Persons with one or more recorded ailment, 1948-S1, 
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The doctors’ records noted not only hospital referrals but also certain cases of 
illness, such as some infectious diseases, certificates in certain cases of incapacity, 
and complaints which might involve referral to another agency at some time in the 
future. Although these cannot by any means be taken as representing the full comple- 
ment of such ailments in the community—since a doctor’s recording depends on the 
time available and the place in which the patient is seen, as well as on the seriousness 
of the complaint—the children’s infectious diseases recorded can be taken as repre- 
sentative. The recorded incidence of infectious diseases in small boys of pre-school 
age is significantly higher than that in girls of the same age. Whether this difference 
—which appears quite startling in the outline illustrating doctor-recorded sickness 
in Fig. 2—represents a real sex difference of such magnitude is debatable: it is certain 
that in this district mothers bring little boys more readily to the doctor. 

The hospital-referred and doctor-recorded persons were found to suffer from a 
variety of ailments which are described in broad groups in Table X. The symptoms 
and diagnoses found in doctors’ records, in sickness benefit claims, and in the answers 
to the survey questionnaire could not all be classified within a single diagnostic 
system. The classifications adopted are non-clinical, but as far as possible the groups 
preserve some clinical homogenity. The choice of these non-overlapping categories 
of sickness is necessarily arbitrary, and certain groups or diseases of particular 
interest have been picked out from larger groups. Many of the persons attending 
hospitals were found to be “multiple admissions” to the hospital system of the city, 


TABLE X 
SICKNESS AMONG POPULATION IN PRACTICES OF MAIN Doctors, 1948-51 
Hospital Referrals (H) Additional Doctors’ Records (D) Claims for Sickness Benefit (C) 








Male Female Both Sexes 
Disease Group* H H H 
H and Per H and Per H and Per 
and C D cent. and C D cent. and C D cent. 
D C D LC D Cc 
Respiratory .. -. | 22 | 4 61 i$-4; 17 | 2 40 9-7 39 69 101 12-5 
Ear, Nose, and Throat 17 11 27 6°8 19 7 26 6°3 36 18 53 6:6 
Digestive i rhs 16 9 23 5-8 | 13 4 16 3-9 29 13 39 4-8 
Ulcer .. ie ren lc 8 14 3-5 3 2 4 1:0 14 #10 18 2:2 
Skin... a eee | 8 19 4-8 ‘16 2 18 4-4 29 10 37 4-6 
Aches and Pains ie 5 24 28 7-1) 13 7 19 4-6 18 31 47 5:8 
Injuries 256 —« | ool 37 | 12-4 12 9 16 3-9 32 46 65 8-0 
Gynaecologica oe ey Lr a6 ~- 25 7| 27 6°6 ; 
Measles se ste DDT 13 3-3 a oe 3 a 6 |<. 16 2-0 
Whooping-cough nae Gi i. 6 i eee ae - a m4 4A 6 
Scarlet Fever . a is a 3 l ‘ oe aes oe is Beat ies I 
* Group Some of the disorders included 
Respiratory .. .. Colds and influenza, but not pneumonia nor respiratory tuberculosis. 
Ear, Nose, and Throat’ Tonsillitis, inflammation of ear. 
Skin... =o" .. Dermatitis, boils, abscesses. 
Digestive ae .. Gastric infections, but not ulcers. 
Ulcer .. ae .. Gastric, duodenal. 
Aches and Pains .. Rheumatism, arthritis, lumbago, sciatica, headache, 
Injuries .. Burns, concussions. 


Gynaecological .. Breast abscess, mastitis, 
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though not necessarily to any one hospital. Moreover, from the number of hospital 
attendances for specific diseases or groups of diseases, the calculated prevalence of 
certain ailments would appear much higher than the actual prevalence of the diseases 
assessed according to the number of persons suffering from them. 

A further analysis of the material presented in Tables IX and X shows that young 
women are referred to a health agency for a multiplicity of reasons, among which 
the investigation of respiratory conditions is prominent. Among older men, res- 
piratory disorders requiring hospital action are considerable. In all age groups above 
25, the additional doctor-records are few, and there is little sex difference cither in 
numbers or in diseases recorded. 

Claims for sickness benefit represent other criteria of seriousness of an illness, 
not necessarily through the involvement of a health agency, but through the incapacity 


TABLE XI 
PERSONS CLAIMING BENEFIT AND NUMBER OF CLAIMS IN FULL (10 PER CENT. SAMPLE), 1948-5] 





Male Female Both Sexes 
Age Group Number Claims , Number Claims Number Claims 
of per of per of per 
Persons Person Persons Person Persons Person 
1-19 {Persons 10 19 2» Ia 32 |: 56 
20-24 ‘Suen sf 1-69 = 1-55 - 1-61 
25-29 Peed = 1-45 . 2-29 = 1-66 
30-34 (‘ . 2-14 : 1-67 = 2-0 
s { ; 1-29 : 1-60 cS 1-42 
on {fo 7 1-29 : 1-5 - 1-35 
45-49 {Soon a 1-95 . 2-0 > 1-97 
50-54 {ioe + 1-27 : 1-6 -u 1-35 
5-59 {Persons | 13 2-31 : 23 16 2-31 
60-64 pend ‘3 2-43 1-0 ' 2-25 
+ {herons | 6 25 6 2-5 
Altvges{cutime, | Ee Bs BR 
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caused and consequent loss of earnings in the family. The number of men who 
claimed benefit at any time since 1948 was 46-8 per cent. of all men between the 
ages of 16 and 69, while of the women between the ages of 16 and 64, 24-5 per cent. 
claimed benefit. It has to be remembered, however, that the women in the sample are 
married, and though a number of the older married women are at present insurable 
because of full-time or part-time employment, comparatively few of them have been 
in continuous employment and may not in the past have been eligible to claim for 
sickness or injury. Hence, the difference between male and female claims does not in 
any way represent a sex difference in sickness prevalence. Table XI (previous page) 
shows the age-sex distribution of the sickness and injury claims of the persons 
claiming; it indicates in each sex certain differences in claims at different ages, though 
it cannot be used to compare the two sexes. 

Among young men the proportions rise progressively from the group aged 16 to 19, 
reaching 50 per cent. in the group aged 25 to 29; above the age of 35 the proportions 
begin to rise again, and in the group aged 40 to 49 over 60 per cent. of all males have 
claimed benefit. 

In Fig. 2, showing the age-sex distribution of persons who claimed sickness 
benefit, can be seen the different pictures presented by hospital-referred persons and 
by claimants for benefit. Among females, the high proportion of claims by young 
women under 25 is noticeable; at ages above this the insured population at any time 
since 1948 cannot be assessed and therefore the numbers claiming benefit cannot be 
related to the female population at risk. 

These claims are by no means single occurrences. Many persons have claimed 
more than once, and some have made a sequence of claims for the same disabilty. 
Table XI illustrates the difference between sickness estimated by number of claims 
and sickness by number of claimants. Table XII (opposite) shows that the false 
picture of prevalence shown by the number of claims applies not only to the total 
volume of sickness but also to specific diseases, particularly the respiratory group. If 
prevalence were judged by the number of claims, the results would be an over-estimate 
of the prevalence of incapacitating respiratory disorders in the population. 

These records show, as did hospital referrals, that a considerable proportion of the 
recorded sickness comes from a small proportion of the population, who have inter- 
mittent attacks and multiple sickness requiring assistance of some kind. The duration 
of disability shown by sickness claims indicates that the great majority of repiratory 
disorders which cause frequent claims are not of long duration: The diseases for which 
long-term incapacity is claimed include the ulcers, and the data reveal these as having 
considerable prevalence in this group. 

The hospital-referred group also indicates a high prevalence of ulcers, rather 
higher than would have been thought from the number of sickness claims. It is 
surprising that for a condition such as this there should be any difference between 
the prevalence indicated by these two reliable and carefully-kept sets of records. 

The ill-health described to us in the sample survey, in which most of the adult 
members of every household were interviewed, indicates a picture of disease prevalence 
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TABLE XII 
PERSONS CLAIMING SICKNESS BENEFIT AND NUMBER OF CLAIMS, 1948-51 (10 PER CENT. SAMPLE) 
(Percentages are of population in related age groups) 





Male Female 
Selected Disease Groups —_—_-- — sine sesiullslipaidlie aienignens 
16-34 35 54 55 and All ages 16-34 35-54 SS5and Allages 


yrs yrs over from16_ yrs yrs over from 16 

Claims 8 12 12 32 5 7 12 

Locomotor* Persons 6 12 9 27 3 5 - 8 
Per cent. 4-0 11-0 17-3 8-7 3°8 2:3 

- Claims 31 31 25 87 3! 11 3 45 

Respiratory Persons 21 22 14 57 22 6 l 29 
Per cent. 14-0 20-2 26:9 18-3 14-0 4:6 8-5 

Ear, Nose, and ‘Claims 10 5 15 12 | 13 

Throat Persons 8 5 - 13 6 | 7 
Per cent. 5:3 4:6 4-2 3-8 2:1 

Claims 4 5 2 11 2 5 7 

Digestive Persons 4 5 2 11 2 3 5 
Per cent. 4-6 3°5 1-5 

Cc lsims 2 10 2 14 | | 2 

Ulcers Persons 2 5 zZ ) | | y 4 

Per cent. 4-6 2-9 
Claims 5 2 2 6 6 
Skin Disorders Persons 5 5 | 11 3 3 
Per cent. 3°3 4-6 3:5 

Claims 23 12 14 49 7 3 2 12 

Injuries Persons 21 12 9 42 7 2 l 10 
Per cent. 14-0 11-0 17-3 13-5 4-5 2-9 

Gynaecological Claims 5 | 10 

Disorders Persons 4 4 8 
Per cent. 2:3 





* Includes rheumatism, lumbago, arthritis. See Table X for other categories of diseases. 

different from that presented by official records of hospital-treated persons. The 
questions referred to sickness within the previous 2 months (/.e. since January, 1951), 
and to major illness in the previous year (/.e. since March, 1950); and questions 
concerning common ailments were limited to a few categories. The answers are sum- 
marized in Table XIII (overleaf). Many questions were not answered because people 
said, “I don’t remember when it was’, or described their feelings without knowing 
“what the trouble was”’. 

In comparing the stated occurrences of sickness or doctor-visits—which informants 
specified quite definitely—with the dates officially recorded for these occurrences, 
we found many in which there were several months’ difference. In one case of hospital- 
ization described as being ‘‘a few months ago’’, we found that it had in fact occurred 
more than a year before the time given by the informant. 

A considerable number of people described themselves as frequently suffering from 
colds or headaches, and respiratory complaints and “*backaches” were also mentioned 
often. But the numbers “suffering from” a condition seemed to decrease as familiarity 
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TABLE XIill 
CHILD AND FAMILY HEALTH, AND AILMENTS OF INDIVIDUALS 
Replies to Certain Questions asked in 133 Families 
(a) FAMILY HEALTH 





Housewives who said 
Question Families cinetenapesiinis,consatantetairenaee 


41 with children of 
schoolage .. 5a 40 | 
**Are the children healthy?” .. rH . — - 
25 with children over 











school age... is ya 
“Is the family healthy ?” és _ ssa 133 130 3 
**Are colds common in the whole family ?”’. . 133 60 67 
“Are other ailments common?” es a 133 16 48 
(b) RECURRENT AILMENTS 
Individuals Children 
Persons who Mothers who 
Ailments named said they were Ailments named said children were 

sufferers sufferers 
Chilblains ae we 22 Chest ord *. me 14 
Coms .. a ate i 51 Running Nose i 6 
Stomach Trouble Ee = 37 Running Ears 3 
Aches and Pains a ve 62 Skin 4 
Cough se re 5 Feet ae ‘i 5 
Chest i 5 Aches and Pains .. z 
Skin = i 4 
Ulcers... .. | Not specified 4 
Varicose Veins .. jp erioner I 
Diabetes .. oe I 





with the name of the condition decreased: in the initial “*common ailment” question 
in which we named no specific illness, and also in the “recent illnesses’ questions, 
a number of informants told us that they had had no ailment. When, however, various 
common conditions were named, informants often recalled being troubled by some of 
them. The prevalence of diseases “‘suffered” by our informants corresponded closely 
with fluency in sickness terminology. Whether this represents a real distribution of 
sickness in the population in general is debatable: in this sample it is certain that it did 
not, for we have objective data which contradict it from a number of sources trained 
in diagnosis. 

The sickness described by informants as recently experienced or recurrently felt 
should correspond, to some extent, with two “stages” of ill-health in the community, 
namely, illness for which the sufferers consult the doctor and illness which the sufferers 
accept as chronic. The latter category, however, was described independently by 
the main doctors, and the picture was very different from that emerging from the 
household interviews. Of the 159 households in the “practice population’’, there were 
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TABLE XIV 
RECURRENT AILMENTS IN “PRACTICE POPULATION” 


(a) Persons stated by Main Doctors to be Sufferers from one or more Recurrent or Chronic 
Complaints, by Age and Sex, April, 1951 





_ Make Female Both Sexes 
Age Group Per cent of Per cent of Per cent of 
No. Practice No. Practice No. Practice 
Population Population Population 

Under | .. - - l l 

I-4.. vd 12 28-6 y) 37-5 21 31-8 
5-9 .. ag 10 32-3 16 50-0 26 41-3 
10-14... as 23 41-1 25 46:3 48 43-6 
15-19... ea 11 24-4 29 | 43-9 40 36-0 
20-24 9 23-7 18 41-9 27 33-3 
25-29... = Y 25:7 9 33-3 18 29-0 
30-34... a 6 28-6 5 29-4 11 29-0 
35-39 9 60-0 13 72-2 22 66-7 
40 44 ) 40-9 20 54:1 29 49-2 
45-49 18 72-0 24 82-8 42 77-8 
50-54 6 23-1 y 45-0 15 32-6 
55-59 8 53-3 11 68-8 19 61-3 
60-64 5 62-5 10 62-5 15 62:5 
65-69 i. sie 5 55-6 | 6 54-6 
70-74... i 3 : 2 . 55:6 
75-79 ~ : 
80 or more oe - l l 
All Ages | es ” 143 36:1 203 49-3 346 42-8 





(b) Households in Doctors’ Practices containing one or more Sufferers from a Recurrent or Chronic 
Complaint, April, 1951. 





Status No. Per cent. 
With chronic or recurrent complaint = 126 79-24 
Well-known to Doctors but with no known chronic or recurrent complaint 10 6-29 
Not weel- known to Doctors and rarely visited by them “$ re is 23 14-46 
Total ie ee a - re si a = sis ee 159 99-99 





comparatively few for which the doctors had no knowledge of chronic condition 
or illness of any kind. Table XIV shows the number of households in which the 
doctors stated that one or more members had some recurrent or chronic complaint. 

Apparently in this community almost every family has at least one member 
whom the doctors put into this category; in many cases the entire household is des- 
cribed by the doctors as suffering from chronic repiratory ailments or continually- 
recurring gastro-intestinal disorders. These conditions are rarely brought to the 
doctors for medical treatment, but are known to the doctors from visits to the houses 
when another member of the family was “really ill”, 
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Persons with recorded illness or recurrent conditions, 1948-51. 
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The age distribution of the persons whom the doctors described as having a 
recurrent ailment, depicted in Fig. 3, calls attention to the needs of an older popula- 
tion. The diseases which recur in this group are described in Table XV. Certain 

TABLE XV 
RECURRENT SICKNESS 
(a) IN FAMILIES AND INDIVIDUALS IN “PRACTICE POPULATION” (SELECTED SICKNESS GROUPS) 





Persons with Sickness Persons in Families 
Type of Sickness in which more than 
Male Female Both one is Affected 

Respiratory Disorders ee = 4 . a 
Tonsillitis ..  ..  .. .. 10 19 29 21 
Dyspepsia 7 te Ss oe 9 5 14 
Ukers ww wee 7 | § 22 5 
Locomotor = : i ia me 7 11 18 8 
Aches (Head and Back) ‘ a 10 1] 21 8 
Anaemia ed ae ae i I 19 20 5 
Neurosis and Anxiety ‘a ne 7 | 17 24 4 
Skin Disorders... 30 35 65 55 
Gynaecological Disorders - ; ; 14 (14) 
Heart a ; a . i a 7 6 13 





(b) IN MEN AND WOMEN IN “PRACTICE POPULATION”, BY AGE. 





55 
Age Group. . 15 24 25- 34 35 44 45- 54 or more 





. sf Male 11 2 5 3 I 
Minot ‘ Female 16 5 2 8 I 
Respiratory 
. Male - I 3 5 5 
Other 4 Female 7 : 3 5 
as Male 2 : 2 2 3 
Dyspepsia .. ia - ma aa { Female | ? 1 
Male 2 5 2 6 3 
Ulcers Female 3 l I 
; f Male - I mee 
Anaemia .. oe -_ _ = ‘ Female : 4 7 4 2 
Neurosis and Anxiety —., : 7 4 ») 
Skin Disorders... = _ { Fonale R : : : 
: Male 3 l 2 
| Obesity... - os os 2s Female 4 2 3 8 4 
Varicose Veins - oll “i , . 
Gynaecological Disorders ey Female | 3 7 3 
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ailments such as obesity, which did not appear in records at all, appear for the first 
time in these doctors’ descriptions and are evidently prevalent. This volume of chronic 
sickness has been presented as such by doctors who know the people intimately and 
serve a large population, and it cannot be dismissed as imagined complaints 
expressed in interviews. Many of the chronic ailments are not minor conditions, and 
are considered by the doctors to be preludes to serious disability in the future. Yet the 
subjects are so used to living with their “trouble” that they hardly look on it as an 
illness requiring medical remedy, but regard themselves as healthy. 

To define health is even more difficult than to define sickness. The underlying 
assumption seems to be that anyone who is not ill is healthy, that there are no grades 
of impaired health but only two sharp opposites. This basic assumption seems to be 
shared by official agencies and lay public alike, though the definitions applied in 
practice by official agencies are based on different criteria from those apparent in 
popular lay behaviour. Sickness as noted by any one agency has previously been 
described as ‘what is brought here and is eligible for assistance’. So most health 
services in practice use the unstated definition of health as “‘any condition which does 
not absolutely necessitate intervention on our part’. In the community, sickness 
is acknowledged by most people only if it is a conspicuous disability which differs in 
type or degree from habitual disability; it has previously been seen that in our sample 
the people did not over-claim nor over-report sickness which was certainly experienced. 
Similarly, in the community health is assumed to be any state which permits the 
continuance of habitual activities and occupations. For these reasons, among others, 
no adequate measure of health is available. At best, we can measure children’s 
heights and weights, or the progress of infants and mothers, and interpret these as 
indices of health. 





TABLE XVI 
HEIGHT AND WEIGHT OF PRIMARY SCHOOL CHILDREN 
10 per cent. Sample (121 children), April, 1951 City Averages, 1948-50 
= : a ee oe 
*Approximate |Measurement| Sex Mean | Standard Error 1948-49 | 1949-50 
Age ' of the Mean? 
Height Boys 42-58 382 42-72 42-47 
(in.) Girls 41-77 -414 42-26 42-14 
*5-Year-Old - *Infants < ————_|-_—_—____ 
(60) Weight Boys 42-02 -909 42-23 42-55 
(Ib.) Girls 41-31 +932 41-01 41-21 
Height Boys 50-74§ -422 51-53 51-59 
(in.) Girls 50-69 365 51-03 51-25 
t9-Year-Old -—-- ee 
(61) Weight Boys 61-38§ 1-229 63-92 64-04 
(Ib.) Girls 60-57 1-034 61-23 62-75 








**Infants” include children aged 5 and usually under 6. In our sample, most of the children’s heights and weights 
were recorded some months after their fifth birthday: some were recorded after their sixth birthday. 

+ *9-year-old” children’s heights and weights include those of children aged 9 and usually under 10. In our sample, 
a number of the heights and weights were recorded after the 10th birthday. 

t The significance of the difference between a sample mean and the corresponding city average is judged according to 
the standard error of the sample mean. If the difference between mean and city average is more than twice the standard 
error, the probability that this is due to chance is less than | in 20: this is taken to be significant. 

§ Mean height and weight of the 9-year-old boys in the sample are significantly below the city average. The means 
for girls of this age are also low. 
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The children in our sample were examined by their school doctors at very different 
times, and therefore the school medical records of the children are not quoted in 
detail. But comparable items in the records are quoted, and Table XVI shows that, 
though the 5-year-old children were similar both in height and in weight to the city 
average, the 9-year-old children were below the city average. In the boys, the difference 
in height was just significant and the difference in weight statistically very significant. 

The city notifications of infectious diseases are restricted to certain categories of 
children, and can be used neither for estimating prevalence nor for comparing with 
sample figures. Table XVII shows these notifications during 1948-51 for the sample 
children in the three local primary schools, and shows also the infectious diseases 
recorded by the doctors during the same period. The school records of infectious 
diseases are not comparable, for they relate to the entire life of the children: from 
these it appears that the great majority of the school children in the sample have had 
measles before the age of five. Also, the mothers’ answers to the survey question- 
naire corresponded closely to the school record figures for measles, whooping-cough, 
and chicken-pox. For scarlet fever, however, the mothers’ answers showed a rather 
higher incidence than was noted in the school records. 

TABLE XVII 
INFECTIOUS DISEASES 





Recorded by 





Notifications to the Doctors 
Source of Public Health Department (children in School Records of 
Information (primary school Practice 121 Children 
children in sample) Population) 
Period .. - 1948-5] 1948-SI Before April, 1951 
Age as ae Under 5 years 5 or more All Ages Under 5 years 5 or more 
Measles*. . — 14 | 12 98 6 
Whooping Cough* 6 | 6 61 7 
Chicken-pox - 47 4 
Mumps .. a 13 2 
Scarlet Fever... | | I s 
* Measles (and whooping-cough, until recently) is compulsorily notifiable in the city if the child is under 5 years of 


age and is the first case in the household. 

Most of the primary school children were also carefully examined by our own 
dentist; of the 107 children whose teeth were fully inspected, 36 were diagnosed as 
having early parodontal disease. Moreover, the dentist’s estimate of oral hygiene 
gave nearly 20 per cent. as neglected and only 42 per cent. as good. The incidence 
of dental caries among young children is very high (Table XVIII, overleaf). Only 
32 per cent. had caries-free permanent teeth, while the other 68 per cent. had a least 
One carious permanent tooth and many required four to eight teeth to be filled. The 
deciduous teeth were even more carious, less than 30 per cent. of the children having 
sound ones. It may be of interest to compare these findings with the latest City 
Report on School Health, in one section of which it is stated that 5-37 per cent. of 
the infant boys and 6-24 per cent. of the infant girls examined were considered to have 





144 LILLI STEIN AND 8S. A. SKLAROFF 


TABLE XVIII 
RESULTS OF DENTAL EXAMINATION OF 114 PRIMARY SCHOOL CHILDREN 





Oral Hygiene 











Age Group Number -~-—--—— - —_————— Gingivitis 
Good Fair Neglected 
5-7 a a os 29 14 14 l 4 
8-10 Ry = - 50 26 16 8 16 
11-13 i? ~t 35 7 15 13 18 
All Ages 5-13... aay 114 47 45 22 38 
Total Children with Children with Decayed, Missing, or Filled Teeth 
Permanent Teeth this Number | —————. ——_- —_—_, —— —————__ ——_— 
Present of Teeth None 1-2 3 4-5 6-7 8 or more 
None es 3 — 
I— $ i 13 10 2 I 
9-16 i 46 18 15 6 7 
17-24 ae 32 5 7 12 7 | 
25 or more 20 2 3 3 4 4 4 
All numbers of 
permanent teeth 111 35 27 22 18 5 4 
Total Children with Children with Decayed. Erupted, or Filled Teeth 
Deciduous Teeth | this Number _ 7 - ; 
Present of Teeth None 1-2 3-4 5 6 7 8 9 or 
more 
None +6 18 
l- 4 aunt 30 7 17 6 
 @ a 12 6 4 I | 
9-12 pts 35 3 7 13 4 2 2 2 2 
13-16 = 10 | l 5 I I I 
17-20 ee 9 I I l I 3 2 
All numbers of 
Deciduous Teeth 96 12 32 29 6 5 5 4 3 





mouth and teeth unhealthy. The figures for the 9-year-old boys and girls were 6-52 
and 5-06 per cent. respectively. In another section it is reported that 76 per cent. 
of the 5 to 17-year-old children who were systematically examined were discovered 
to require dental treatment. 

The School Health Service dentists visit schools as regularly as possible, and 
inform the parents of children whose teeth need attention by sending them a “consent 
card” offering treatment. Some parents indicate refusal by writing “teeth healthy” 
or “no fillings required” on these cards. Such are the layman’s ideas of dental health 
as compared with the standards of dentists. 

Ideas on what constitutes health were revealed also in answers to questions in the 
household interviews. Certainly the childhood infections were not mentioned unless 
specifically named by interviewers. The familiarity of childhood illnesses is such 
that they seem to be regarded as merely minor disturbances incidental to the growth 
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of the child. But other conditions, such as running ears, skin troubles, or continual 
colds, were also considered compatible with perfect health. Of the housewives who 
were asked whether theirs was a healthy family, almost all answered vigorously in 
the affirmative; a very few qualified their affirmative a little, and only one suggested 
a negative answer. Similarly very few said that they bought patent medicines, but, 
when the general question was followed up by questions on aspirins, salts, ointments, 
and so on, it appeared that in four out of five households some form of laxative was 
bought. Aspirins and headache powders ranked nearly as high (Table XIX). 

Laxatives seem to be given to entire families of children automatically as “‘health 
care”’ in the traditional Saturday night dose. Persons of all ages, but particularly older 
men, stated that they regularly took laxatives twice a week or more. 


TABLE XIX 
PATENT MEDICINES BOUGHT AND USED BY 5 PER CENT. OF HOUSEHOLDS IN ESTATE (109 FAMILIES) 





(a) Type of Patent Medicine Laxatives* Salts*  Aspirint Ointment Cough Total 
Mixture Families 


No. of Households... 42 4\ 14 12 27 109 




















Total one 
4 5 or more 


tN 
we) 


(b) No. of Medicines Bought 0 l 


N 
nN 


No. of Households om 22 27 21 13 4 87 





* Laxatives (including Salts) were taken occasionally by 17 persons, and regularly by 89 (information regarding children 
was given by the mother). The approximate age distribution of the 89 was: Children, 31; Women, 14 aged under 40, 
“s Vicciin Gon chananie w 2 beh picks ohne by 26 (of the total 46, 18 were young women). 

Several other questions touched on attitudes to health and practices in health 
care. The answers were too diverse for classification, but they revealed again the 
dichotomy between popular concepts of health and sickness. Just as official agencies, 
both in theory and in practice, make a distinction between sickness services and health 
services, a distinction which is reflected in their records, so the habits of the community 
reflect the many divisions which exist between the various aspects of sickness, and 
differentiate even further between any of these aspects and the standard of health. 

The various records have already shown how the different aspects recognized as 
sickness by the different agencies can produce a picture of prevalence which may 
be not only false but misleading. Similarly our household interviews showed that 
these divisions between aspects of sickness led to people becoming involved with one 
hospital after another: the number of agencies with which many of the households 
have been in contact is quite startling. For many families the passage from one hos- 
pital to another department, thence to a clinic, and perhaps again to a hospital, 
must have caused much distress and disturbance. 

The family answers as well as the records underestimate the prevalence of certain 
diseases and overestimate others. Certain diseases have a higher prestige in conver- 
sation and certain recurrent ailments are never mentioned. Prominent among these 
are the “women’s complaints” (mentioned by one informant only and “not 
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remembered” by other informants even after prompting); neither does dysmenorrhea 
appear in any record at all: yet it is known to be prevalent in all communities, and 
certainly in this group. A number of women discreetly mentioned “monthly” 
aspirins in answer to the “‘patent medicine” questions. To some extent mention of 
intestinal or gastric complaints was also evaded, and ulcers, though quite “respectable” 
were mentioned by only four of the 21 sufferers. 

Manifestly the dichotomy between the concept of health and that of sickness 
exists in official as well as in popular thought and practice; yet it is a different 
dichotomy. Official concepts seem to be based not on medical or welfare standards 
but on the involvement of the agency and the eligibility of the applicant for benefit. 
The community also judges health and sickness according to non-medical criteria. 
Attitudes to sickness seem to depend on what is “‘normal” by group standards and 
experience, and on what might appear as conspicuous in the group. In addition, 
the family tends to judge sickness by the conspicuous departure of an individual 
from familiar disability or habitual ailment. Thus both group and family norms 
influence the conduct of people faced with unusual illness; and also, the answers 
given to interviewers are based on the family attitude to “strangers”. Group atti- 
tudes to health are more nebulous: health has prestige in this community, particularly 
in the verbal response to outsiders. Thus the criteria of health are completely different 
from the criteria of sickness. Being healthy depends on group norms of experience 
and approval as well as on physical fitness. Judgments of health are based also on 
personal norms of habit, of accepted disability, of the capacity to function in house- 
hold or occupational routine, and of the frequency with which illness interrupts such 
a state of health. ° 

It is not surprising that the popular attitude to sickness should be so far separated 
from the facts of disability. Most disabilities in this group are of the type which 
advance gradually and to which the sufferers become adapted. Moreover, there is no 
agency, either in this estate or in most other communities, which caters for the “not 
fully healthy” by offering remedial services which do not disrupt daily life or earning 
capacity. For the person who needs treatment but who is not conspicuously ill, 
regular remedial measures which can be fitted into normal working life are almost 
impossible to obtain. It is natural therefore that most people continue to live with 
their disabilities, continue to adapt themselves for as long as possible while those 
disabilities do not become so conspicuous that they demand action. The popular 
concept of sickness must inevitably be influenced by the financial difficulties of being 
unfit, and by the remoteness of convenient remedy. If health services do not offer 
suitable provision for those in a state of impaired health, it is to be expected that the 
ailing will not regard themselves as sick. 


DISCUSSION 
The measurement of sickness in the population is the indispensable foundation 
for the proper provision for future needs, and the measurement of the intermediate 
stages between sickness and full health is equally essential for planning new services. 
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The actual needs of a community are often not revealed until some provision for 
treatment or for health promotion has become established. Provision made on the 
basis of past recorded sickness only can lead to serious under-estimation of the 
extent to which new services will be used. 

But measurement of sickness and of degrees of unfitness cannot be attempted at 
national or large-scale levels unless very full local knowledge is already available. Indices 
of health or sickness have little meaning if they are only averages for large numbers 
of people who have no common life and environment. The measures of health and 
sickness must be evoived in relation to a community group and its characteristics— 
whether the community is large or small, stable or changing, what population com- 
position it has, whether the married women work, whether the different generations 
are crowded together in the same households, and whether local social and economic 
conditions encourage certain ailments. Such characteristics form only part of the 
required knowledge. Equally needed is knowledge of endemic or chronic diseases, 
of attitudes to sickness and the use made of provided services, of attitudes to health 
and the possibilities of health education or promotion, of the working hours and 
leisure habits against which the provision of remedial treatment might be planned. 

The problem of obtaining measurements which have meaning is simiiarly bound 
up with the attitudes and standards of the doctors, specialists, and officials who have 
authority in the group. “Doctor norms” influence both the behaviour of a group in 
its use of medical services in cases of “real illness” and the possibility of measurement 
from the records kept. Even the current definitions of health and the recognition 
of sickness depend on medical attitudes and advances: they form the background 
against which the prestige of health can change, against which some illnesses develop 
high standing whilst others become almost taboo. 

Thus measurement is related not only to sickness experience and the difficulties of 
defining and recording it, but also to attitudes and social characteristics. The future 
needs of a group are also related to these circumstances and characteristics, and 
national estimates of needs can only be based on intensive local studies. The demo- 
graphic structure of any group is the framework in which the future population 
develops, and the conditions of the present foretoken future socio-medical needs. 

In the community from which our sample was drawn these points are amply 
illustrated. Group standards of “normal” health effect recognition of sickness and 
action taken. Equally they influence accuracy of measurement, particularly that 
obtained by survey interviewing when answers to questions are biased by attitudes 
to interviewers as well as by interpretations of sickness symptoms and labels. Official 
records give measurements biased in other directions, for they measure not sickness 
but those conditions which are conspicuous and eligible from their particular stand- 
point. One cannot say that the needs of this community are underprovided, for no 
needs are expressed: yet real needs are seen when the picture of composite sickness, 
of chronic ailments, and of impaired health is examined. 

The present age-structure of this group and the social and economic difficulties 
of the elderly and of the widowed or separated women disclose some of its special 
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needs. The number of middle-aged and elderly people referred to hospitals, the high 
incidence of respiratory and gastro-enteric diseases, and of chronic sickness, all bear 
witness to the need for local provision directly related to local ill-health. 

The age-structure and housing circumstances of this group must lead to an even 
greater need of medical care in the future, as the young married couples find houses 
of their own outside the estate, and increasing numbers of elderly couples or widowed 
persons are left as sole occupants. The higher proportions of elderly people will in 
themselves necessitate greater provision for thiscategory. The elderly may be permitted 
to take lodgers in order to fill the house and to supplement their pensions: on the other 
hand, the houses may be required for younger families whose need for re-housing is 
great. Priorities at present include some due to illness: the estate already contains 
a number of families re-housed owing to overcrowding associated with illness. If 
this tendency continues, the population will come to contain even more families 
with known chronic or infectious disease. 

Another problem is that of three-generation households in which young couples 
are bringing up their babies. The young mother is inevitably influenced by living 
in the house of her mother or mother-in-law. To what extent the attitudes of the 
grandparent generation would affect the young mother if she were in her own home 
one cannot say: it is certain that when she has her first child in the maternal home 
the dominant influence is that of the older woman who has “brought nine children 
into the world, my girl’’. Attitudes to health and ante-natal care, to infant feeding, and 
to “‘dosing” of children are in many households those of the grandparent generation. 

From the material collected here, certain conclusions are clearly indicated: 


(1) The problems of measuring sickness and of assessing present and future needs 
in the context of the actual group circumstances require: 

(a) Definitions of sickness and systematic classification of ‘‘stages”’ of health from which 
the composite “‘total” sickness in the group can be compounded; 

(b) Intensive study of the group and intimate knowledge of the many sides of group life 
and standards, so that the attitudes and behaviour in relation to ill-health can be understood. 

(2) Sickness records for the unit of intensive study, the small community group, 
should be built up on the basis of the household rather than of the individual. The 
main doctors on this estate have long since arrived at the practice of keeping a family 
index, from their experience of the best and most relevant method of working in the 
community. It may be noted also that the non-medical agencies which give financial 
or social assistance keep sickness data for the whole family together and see sickness 
in any member of the family as affecting the family welfare. 

(3) The classification system of stages of “impaired health” is suggested as one 
which can start with the person experiencing disability and can take in each stage of 
ill-health from that point. The first stage of “‘impaired health” is that in which minor 
disability is experienced intermittently, and such minor disability will often not be 
seen by a doctor: at present it can only be discovered by surveys and family interviews. 
Nor may the next stage, that of recurring disability, be brought to a doctor, though 
on the other hand, the stage of ill-health which is brought to the doctor may be much 
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less serious. It is evident that a systematic classification must be worked out, and 
that co-operation between health workers’ methods of recording stages of impaired 
health must be constantly maintained, if community sickness is really to be measured. 

But the present knowledge of group differences in respect of sickness attitudes and 
actions is quite inadequate for any systematic classification to be established. Many 
intensive studies of many groups in different localities are required if such knowledge 
of the volumes and types of sickness is to become available. 

Certain ailments in particular seem to require fuller study and clearer definition. 
For example, many people were described by the doctors on the estate as “‘neurotic’’, 
yet the judgments on which such descriptions were based varied greatly. 

From the composite picture of sickness in the sample the present state of health 
of this estate community can be assessed. It seems probable that some of the health 
problems are associated with the population structure which has developed from the 
originally-rehoused families and from the outward migration of the children. The 
wisdom of “setting up a community” by filling a large number of newly-built houses 
with a very selected group of families is debatable, since in relieving one urgent need, it 
may cause serious socio-medical problems to arise in the future. 

These matters are by no means of local importance only. The housing estates in 
Britain are numerous, and many of them have grown up like this one—by initial 
settlement at one period in time, by the selection of a definite category of the general 
population, by location at some distance from the centre of the city, of families of 
approximately similar size and age, in houses uniformily built and equipped. With the 
present need for houses still more “estates” may be created. If local authorities 
can only take future population developments into account when these estates are 
peopled, the socio-medical difficulties which can evolve in a selected population 
after a period of 10 to 20 years may be avoided. 


SUMMARY 


(1) A socio-medical survey was made of a housing estate where families were 
re-housed 16 years ago. The survey was undertaken as a pre-pilot enquiry in connec- 
tion with a long-term research programme for the study of a “social disease’. The 
need to study the social group in which a disease occurs and spreads, rather than the 
“diseased” and “controls”, is emphasized. 

(2) Four aspects of community life—housing occupancy, general health ex- 
perience, food habits, and leisure activities—were investigated. The information 
obtained in household interviews was related to the data obtained from all possible 
records. One aim of the enquiry was to compare and appraise the different pictures 
of ill-health as described by different records or by sample survey interviewing. 

(3) The demographic structure differs from that of the city population as a whole. 
The age distribution is bi-modal, the majority of the main householders being elderly. 
There are marked deficits in the age-groups under 10, and between 25 and 39. 

(4) The structure of the households seems to be associated with both the original 
selection of the families and the present housing shortage. More than one quarter 
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of the households contain, in addition to the main family, a sub-family of a young 
married son or daughter. More than 18 per cent. are three-generation households. 

The resulting demographic problems are indicated. The developments to be 
expected as young couples find houses outside the estate indicate still greater problems 
in the future. 

(5) The levels of sickness as estimated from each set of records are described, 
and compared with each other and with the picture obtained from the survey inter- 
views. The differences between the proportions of people absent from work because of 
illness and the proportions referred to hospital are greatest for men aged 20 to 29 and 
women aged 20 to 24, and are also large in other age-sex groups. 

The proportions of people referred to hospital, when compared with those claim- 
ing sickness benefit, indicate that absence from work owing to illness is not excessive. 

(6) The diseases prominently represented in the different records differ somewhat, 
but injuries and respiratory disorders form a considerable proportion of each. 

(7) The descriptions of personal and family health given in the survey interviews 
indicate that most families do not regard their ailments as “illness” except when they 
cause conspicuous disturbance. The reliability of sickness estimates based on survey 
interviews is questionable, not only because of memory difficulties, but more noticeably 
because of divergencies in community definitions of sickness and differences in group 
perceptions of “‘conspicuous sickness”. 

(8) Additional data are presented on the incidence of recurrent and chronic 
ailments as observed by the doctors. In all age-sex groups the proportions suffering 
from recurrent conditions are considerable, and increase with age. Nearly 80 per cent. 
of the sample households contain at least one person with recurrent chronic ailment. 

(9) The high incidence of respiratory disorders is also seen in the recurrent 
ailments. Certain other conditions especially skin diseases in both sexes, and in 
anaemia, neuroses, and obesity in women, are more prevalent than the records show. 

(10) The health of the primary school children is considered in terms of heights 
and weights, childhood infectious diseases, and condition of teeth. The incidence of 
dental caries is very high, even in the permanent teeth of young children; only one- 
eighth of the children had sound deciduous teeth. 

(11) The dichotomy between the concept of health and the concept of sickness is 
apparent both in official and in popular thought. Measurements of sickness from 
administrative records cannot be regarded as indices of health. 

(12) Measurements of sickness in different communities may differ because of 
their relation to group characteristics and attitudes, and can only be interpreted in the 
light of detailed knowledge of local norms. The accumulation of such local knowledge 
requires intensive studies of many other small communities. 

(13) Some unified system of sickness classification is needed to relate the 
different concepts of ill-health to each other and make all records of sickness 
contiguous and comparable. A common set of definitions should include the first 
stage of “‘impaired health” from its occurrence in the individual, and embrace the 
various stages of ill-health as treated by medical and other agencies. 
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(14) There are a number of socio-medical and health problems in this housing- 
estate community. The significance of these findings needs to be considered in relation 
to the probable increase in the number of re-housed communities in the future. 


It is with pleasure that we express our deep gratitude to Professor F. A. E. Crew for his vision 
in the initiation of this work and his encouragement in its execution. Grateful thanks are also offered 
to the many national and local authorities for their ready co-operation, to the Edinburgh Public 
Health Department and its staff for their generous assistance, to the Health Visitors for their untiring 
energy, to the doctors for their guidance and consistent interest, to the hospitals whose records were 
made available, and to all the people whose work went into this survey 
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OBSERVATIONS ON ALL BIRTHS (23,970) 
IN BIRMINGHAM, 1947* 


VI. BIRTH WEIGHT, DURATION OF GESTATION, AND 
SURVIVAL RELATED TO SEX 
BY 
J. R. GIBSON AND THOMAS McKEOWN 
From the Department of Social Medicine, University of Birmingham. 


In this report we use data for births delivered in Birmingham during 1947 (Gibson 
and McKeown, 1950) to examine sex differences 

(a) in birth weight and duration of gestation 

(4) in survival rates. 


BIRTH WEIGHT AND DURATION OF GESTATION 
That mean birth weight is higher for males than for females has been consistently 
recorded (for example by Pearson, 1900; Murray, 1924; Martin, 1931; Bakwin and 
Bakwin, 1934; Anderson, Brown, and Lyon, 1943), and more recently by Karn and 
Penrose (1951), Norval, Kennedy, and Berkson (1951), and Salber and Bradshaw 
(1951). There have been fewer investigations of duration of gestation; but in general 
reported differences between the two sexes have been trivial (Schlichting, 1880; 
Siegel, 1921; Anderson, Brown and Lyon, 1943; Karn, 1947; Karn and Penrose, 1951). 
Table I gives mean birth weights of the Birmingham births as 7-57 and 7-31 Ib. 











TABLE I 
PERCENTAGE DISTRIBUTIONS BY BIRTH WEIGHT 
Birth Weight (Ib.) Male Female 
Under 4 ne ; = ‘Me See ae es ng; eel 1-32 1-24 
4— .. ne ee oes a Ae ys 1-59 1-92 
5— 5-88 8-07 
6 20-52 25-91 
7 34-92 36-18 
8 24:51 20-01 
9 2% zn f ha = ne i 8-58 5-22 
10 and over .. re cs aN ve ne 4 2-68 1-45 
Ces eee 100 
Number of Births .. .. ..  .. 11,602 10,811 
ME ee esses wk 7-57 7-31 
Error of Mean es om ‘ 4 -_ ; _ - ae 0-012 0-012 
Standard Deviation... .. .. .. «2 ss 1-31 1-24 





Sex was unspecified in 41 of 22,454 births of known weight. 


* This research was assisted by a grant from the Birmingham University Students’ Social Services 
Fund, 
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for males and females respectively. In comparing these figures with others reported 
in the literature it should be remembered that about half the births were delivered 
in hospitals and nursing homes and about half at home, and that mean birth weight 
is approximately half a pound heavier for domiciliary (7-65) than for hospital births 
(7-15) (McKeown and Gibson, 1951). If the comparison were confined to institu- 
tional births the means would no doubt be very close to those recorded recently by 
Karn and Penrose (1951) for University College Hospital, London (7-27 for males; 
7-06 for females). 


Table II gives the mean duration of gestation (in days) as 280-29 for males and 
280-78 for females; the difference between the means is just significant by the con- 
ventional tests, although it is of course quite trivial. In particular, and what is of 
most biological interest, it in no way accounts for the sex difference in weight at 
birth, which must be wholly attributed to a difference in the rate of foetal growth 
(unlike the difference in weights of twin and single births, which is partly attributable 
to a difference in rate of foetal growth, and partly to a substantial difference in the 
duration of gestation). 


TABLE II 


PERCENTAGE DISTRIBUTIONS BY DURATION OF GESTATION 








Duration of Gestation (weeks) Male Female 

Under 34 a ae: eee oa ae ame a 2-09 2-04 

34— .. vt ia Sig a ag i 2:62 2:55 

36—_... = sa ee aa = - 7-38 6°62 

38— .. a me cs ee a a 29-01 28-13 

40— .. re aS ae os ie ae 44-74 46-73 

42— .. aos oy “o is 7” i 11-53 10-98 

44 and over 2°63 2-95 
ee gs las Ullal 100 100 
Number of Births .. Sd 8,765 8,306 

Mean (days)... an 280-29 280-78 

Error of Mean — a sc ne xe se 0-17 0-17 

Standard Deviation .. 15-75 15-75 





Table III and the Figure (overleaf) show the mean birth weights of males and 
females according to the duration of gestation. At all intervals examined, males are 
heavier than females, although in many cases the differences are not significant. It is 
of course to be expected that the sex difference in foetal weight would become more 
conspicuous during the period of rapid growth, but on this evidence the difference 
is present as early as the 29th week, 
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TABLE III 
BIRTH WEIGHT RELATED TO DURATION OF GESTATION 

















Male Female 
Duration _— | eee 
of Number Mean Error Number Mean Error 
Gestation of Weight of Standard of Weight of Standard 
(weeks) Births (Ib.) Mean _ Deviation Births (Ib.) Mean __ Deviation 
3 18 2-78 | O-15 0-64 142-71 O11 0-43 
30 34 3-56 0-20 1-16 39 3-37 0-18 1-14 
32 64 4-33 0-12 0-97 56 4-09 0-17 1-25 
34 193 5-90 0-09 1-29 184 5-86 0-10 1-39 
36 647 6:80 0-05 1-23 550 6:60 0-05 1-27 
38 2,543 7-43 0-02 1-13 2,336 7:17 0-02 1-11 
40 3,921 7-84 0-02 1-04 3,882 7°55 0-02 1-08 
42 1,010 7-98 0:04 1-18 910 7:68 0-04 1-31 
44—46 182 7-99 0-09 1-17 166 7-80 0:08 1-00 
All births of 
known duration 
of gestation 8,612 7:56 0-014 1-28 8,137 7-30 0-014 1-29 
All births of 
known birth 
weight 11,602 7-57 0-012 1-31 10,811 7°31 0-012 1-24 
Ca, 0 
8a Pree ~----? 
e--” 
MALES 4@ (o~ 
re FEMALES 
74 a 
oe a 
@ 
2) 
7 
7 
6 wd 
a 
- a 
oO 
7 
Ss 
_ 
oe 
a 
4a é e 
7 
7 
es 
e 
3,4 °°" Fig. 1.—Birth weight related to 
Sal duration of gestation. 
T T "T T i] T T J 
28 30 42 44 


34 36 38 
DURATION OF GESTATION (WEEKS) 





Sis. 
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; SURVIVAL 

Tables IV and V (overleaf) show the association of sex specific stillbirth and neo- 
natal mortality rates with birth weight and duration of gestation respectively; because 
the numbers are small for this purpose the data have been grouped. Both rates are 
higher for males than for females, the slight differences between the corresponding 
figures in the two tables being explained by the different number of births for which 
weight and duration of gestation were known. We now enquire to what extent 
the sex differences are accounted for by differences in birth weight, duration of 
gestation, and cause of death. 


Sex SPECIFIC STILLBIRTH RATES 


(a) For the relatively small number of births delivered at weights less than 3 Ib., 
female stillbirth rates are higher than male rates (Table IV). The difference is 
attributable almost entirely to the higher incidence in this weight group of female 
stillbirths due to anencephalus. Similarly when considered in relation to duration 
of gestation, the sex specific stillbirth rate for births delivered before completion of 
38 weeks’ gestation is higher for females than for males (Table V), again because of 
the different sex incidence of anencephalus. The higher female stillbirth rate is 
exhibited by births delivered in each fortnight from 28 to 38 weeks (examined by 
subdividing the group shown in Table V). 

The association of anencephalus with early onset of labour is of course well 
recognized, and since the proportion of males affected is considerably lower than the 
proportion of females, this will naturally influence the sex specific stillbirth rates. 
Moreover it has recently been shown that the sex ratio (expressed as precentage of 
males) of stillbirths due to anencephalus increases with duration of gestation (McKeown 
and Lowe, 195la). The marked effect of this malformation upon sex specific stillbirth 
rates of births delivered early or at low weights is therefore explained partly by its 
high incidence relative to other causes of stillbirth, and partly by its low sex ratio 
relative to anencephalics delivered later in gestation. 


(4b) For births weighing from 3 to 5 Ib., male stillbirth rates are higher than female 
rates, mainly because of the much higher incidence of male stillbirths grouped under 
‘tall other causes’’. These causes include maternal disease, toxaemia, and constriction 
of the cord, but the group consists largely of stillbirths due to ill-defined and unknown 
causes, many of which on grounds of low weight would commonly be attributed to 
“prematurity”. 


(c) For births of 6 lb. and over, male stillbirth rates are again higher than female 
rates; the difference is comparatively small, and is due to higher numbers of male 
stillbirths attributed to “‘mechanical causes” and to “all other causes” about equally. 
Indeed, in another context, McKeown and Lowe (1951b) concluded that above 
3500 g. (approximately 7} Ib.) there was no significant difference in sex specific 
stillbirth rates of males and females delivered at the same weight. In arriving at this 
conclusion they considered the 1947 Birmingham stillbirth rates, but were influenced 
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mainly by data for New York (published by Baumgartner, Pessin, Wegman, and 
Parker, 1950) based on much larger numbers. 

In two respects the data here considered are not entirely satisfactory. In the first 
place, the number of births makes it necessary to use coarse weight groups, with the 
result that we are not strictly comparing births of the same weight, males in any 
weight group being somewhat heavier than females. Secondly, except in the case 


TA 
STILLBIRTH AND NEOo-NATAL Monig REL 





Stillbirth Rates* attributed to 




















Birth Total $= —— —— ———___—_. ————_ ] 

Weight Sex Births Other Mechanical All Other 
(Ib.) Anencephalus Malformations Causes Causes fy Cg 
(a) Male 70 6 3 : | sa 

Under 3 (b) Female 62 23 3 32 58 
(a) — (b) am 17 3 re) <e 
(a) Male 951 0-6 1-0 1-6 73 Tos 
3— (b) Female 1,154 0-8 1-0 0-9 48 17.5 
(a) — (b) — | =o 0-7 25 Ize 
(a) Male 10,581 0-02 0-10 0-51 0:83 |y.4 
6 and over (b) Female 9,595 0-01 0-10 0-34 0°67 4y.4 
(a) — (b) “ 0-01 O17 0:16 No.3, 
(a) Male 11,602 0:10 0-19 0-61 1-56 [og 
Total (b) Female 10,811 0-22 0-21 0-40 1:29 Bo. 
(a)—(b) am 0-12 0-02 0-21 0-27 Fog 

* Stillbirths per hundred total births. 
TA 
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Stillbirth Rates* attributed to 
Duration of Total ——___—_——_—_— ———— —— 








Gestation Sex Births Other Mechanical All Othe] 
(Weeks) Anencephalus Malformations Causes Causes fi Cz 
(a Male (9SHC—“(<té‘ét 0:6 0-8. 65-2 It 

Under 38 (b) Female 843 2°4 0-9 0:8 5*6 49-7 
BP AE ET ea Fe oe in en _ 

(a) — (b) — -1°8 0-3 0-4 ps 

(a) Male 7,656 0-04 0-16 0-53 1:16 Jy.g 

38 and over (b) Female 7,294 0-01 0-11 0:34 0:90 41.3 
BP A RE eS Es A SE a —_|___—}\—— 

(a) — (b) — 0:03 0-05 0-19 0:26 fo-5 

Pe FO NS Lee Senne pahainieaaies ee nk 
(a) Male 8,612 0-10 0-21 0-57 1:62 92-5 

Total (b) Female 8,137 0-26 0-20 0-39 1-37 92-2 
(a) — (b) — 0-16 0-01 0-18 0:25 $0.2 





* Stillbirths per hundred total births. 
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of the congenital malformations, and possibly of “‘mechanical causes” of stillbirth, 
the certification of cause of stillbirth is unreliable, and consequently no attempt has 
been made to subdivide the group entitled “all other causes”. This means that where 
small differences in sex specific stillbirth rates are observed we cannot be certain that 
they are not influenced by grouping, and that where substantial differences exist 
we are unable adequately to explore the cause of death. In spite of these limitations 

TA 
L Moni RELATED TO BIRTH WEIGHT 
to Neo-Natal Mortality Ratest attributed to 
All Othef Total Say Mechanical All Other 
Causes || Causes Live Births Malformations Causes Infection Causes All Causes 
on 40 | 3 17 5 60 85 
32 8 26 — 11 4 54 69 
I 5.9 ie ¥ “Ss. ~ fh 6 l 6 16+.10 
73 fos 851 0-6 2-9 0-9 3-4 7:8 
4:8 17.5 1,068 0-3 1-6 0-9 3-2 6-0 
2°5 Tyan - ; 03. 3 0-2 1-8+1-2 
0-83 11.46 10,426 0-13, 0-27 0-24 0-48 112 
0-67 B1-12 9,488 0-09 0-19 0-17 0-31 0-76 
016 Yoyo 2 =©6Cti(<i‘ézi HFC! 0-08 0-07 0-17 0-36-4.0-14 
156 f>.46 11317, 0-18 0-53 0-31 0-91 1-93 
1-29 99.12 10,582 0-11 0:36 0-26 0-72 1-45 
0-27 fo3i0-20  — Oo7 | O17 0-05 0-19 0-48 -4.0-17 

t Deaths in the first month per hundred live-births. 

TA 
. MORME RELATED TO DURATION OF GESTATION 
to Neo-Natal Mortality Ratest attributed to 
Il one . Total | Nflechanical | All Other 
i ll Causes Live Births Malformations Causes Infection Causes All Causes 
5-2 J72 mae | oF | + 11 4-4 8-5 
36 §9-7 761 0-3 1-8 0-9 3-4 6:4 
O04 05413 | 4+— | 06 | O38 |  O2 1-0 2-141-3 
16 Figg St eti(i‘éi aS CH 0-28 1-06 
0:9 11-36 7,195 0-21 0-15 0-24 0-36 0-96 
0-2 fo-ssio-2| — | — | ow | —. 0-08 0-10-40: 16 
————_F _ aA ae oe are, es Speer Fe a 2 : cots . es os 
1-62 92.50 8,397 0-28 0:52 0-33 0-73 1-86 
1-37 72.22 7,956 0-27 0-31 0-30 0-66 1-48 
0-25 f0-240:23; — | oo | o21 #=| 0:03 0-07 0-38 40-20 
ie 

+ Deaths in the first month per hundred live-births, 
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it seems justified to conclude that the difference in sex specific stillbirth rates (all 


weights) reflects: 

(i) higher female stillbirth rates at the lowest weights (under 3 Ib.), attributable 
to the prominence of anencephalus as a cause of female stillbirths; 

(ii) higher male stillbirth rates between 3 and 51b.; no reliable explanation can be given; 

(iii) slight or possibly no difference (in view of the New York data referred to 
above) between sex specific stillbirth rates at 6 lb. and over; 

(iv) sex differences in weight distributions of related total births, attributable to the 
different growth rates of male and female foetuses (see Figure). 


Sex SpeciFic NEO-NATAL MortTALity RATES 

There is little to be said about the sex specific neo-natal mortality rates shown in 
Tables IV and V, except that in every group (by weight and duration of gestation) 
male rates are higher than female rates. The same result is obtained when the data 
are further subdivided into | lb.-groups for weight, and into fortnightly groups for 
duration of gestation. We are of course again confronted with the difficulties 
referred to above (small numbers and unreliable certification) and can draw no firm 
conclusions about the cause of the higher male death rates. 


SUMMARY 
(1) For births delivered in Birmingham during 1947, mean birth weights (in 
pounds) were: male, 7-57; female, 7-31. Mean durations of gestation (in days) were: 
male, 280-29; female, 280-78. The sex difference in birth weight is exhibited by 
births delivered as early as the 29th and 30th weeks. 
(2) The male stillbirth rate is higher than the female rate, and the sex difference 


is shown to be associated with: 

(i) higher female stillbirth rates under 3 |b., explained by the prominence of 
anencephalus as a cause of early female stillbirth; 

(ii) higher male stillbirth rates between 3 and 5 Ib.; no adequate explanation is given; 

(iii) slight or possibly no difference between sex specific stillbirth rates of males and 
females of 6 lb. and over of the same weight. 

(iv) sex differences in weight distributions of related total births, attributable to the 
different growth rates of male and female foetuses. 


(3) The male neo-natal mortality rate is higher than the female rate. The sex 
difference is still present when births in the same weight group are compared. 
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On the Objective Study of Crowd Behaviour. By L. S. PENrose. Pp. 74. H. K. Lewis, London. 
1952. (10s.) 


In this brochure Professor Penrose puts forward one novel, challenging, and highly 
original idea buried in a banal matrix of tedious metaphor and metalepsis about the 
behaviour of men in groups and their reactions to micro-organisms and viruses. Those who 
fail to derive any profit from the analogy between crowd diseases as Greenwood uses the 
term and crowd disorders as Penrose does may also miss the point which makes the 
publication of the essay more than worth while. True to the Galton Laboratory tradition, 
the author assumes that the reader, if also a mathematician, will immediately grasp the 
statistical theory he advances; and, if not, will be too dumb to do so. This is a pity, because 
a public of thoughtful people is getting more and more suspicious of statistical generalizations 
advanced for allegedly adequate theoretical reasons when there is, as for the so-called cube 
law, merely a somewhat exiguous empirical basis to support them. 

The Penrose square-root law has also to do with voting; and what follows is an attempt 
to fill in the argument which the author himself does not deign to elaborate. The elaboration 
is all the more pertinent because his hope that the reader “‘will tolerate the necessary intro- 
duction of mathematical notation” (p. 6) immediately precedes three gross errors in the 
formulae which follow, viz.: 


K/\/n, on p. 7 should presumably be Ky 17,, 


n,—1/n, should presumably be 7, ny, 


the differential element in the definite integral on the same page is 
missing; and ,/ 2 7 in this formula should be / 2z. 

Fortunately, the Table at the foot of p. 7 permits one to reconstruct the line of thought 
from which the author’s pen has erred so consistently. One may state it as follows: If a 
collection of n indifferent voters registers choice at random on a yes-no issue, the distri- 
bution of ayes or the reverse tallies with the terms of the binomial ($ + 4)”. If we denote the 
number of ayes by the score x, its deviation from the mean is (x — $n) — X¥ = }(2x — n). 
The excess of ayes (majority, if positive) is x — (n — x) = (2x — n), so that 2X — m, if 
positive, is the majority for the affirmative. The variance of the score distribution is }# so 
that the square standard score corresponding to x is 
XX? , 
—=—— ia i = ee =r ia as (1) 
ja n 

If is fairly large (e.g. 20 or over) we may treat this as a 7° variate whose mean value 
is unity. Thus the mean value of #* should be n, if the 7 voters acted independently at random. 
If observation of many elections in presumptively stable conditions shews that the mean 
value of mn? for an observed mean value (7,,) of m is in fact k  '.n,,, the voting system operates 
as if k  '.n,, individuals replace n,,, or as if A individuals on the average behave as a block. 

This aspect of the Penrose square-root law implicit in equation (/) is of secondary 
importance to the way in which a small block acting as one individual can control most 
decisions. In addition to 7 indifferent voters we postulate in the derivation of (/), we may also 
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postulate a compact block of b individuals who will always vote against the ayes, so that 
the ayes will not get a majority unless m > b; i.e. 
b 


c> — ike a sr ee oe = a Pi (ii) 
Vin 

With the value of c so defined, the probability that the ayes will not get a majority is 

then, of course 

io. e 3° de 

fat J —u 
This defines 97-7 per cent. of the area of the normal curve when c — 2, as when b = 200 
and n = 10,000, or when 5 = 2,000 and n = 1,000,000. Examination of voting records in 
the light of considerations advanced earlier strongly suggests that communities such as 
Great Britain and the U.S.A. behave at the polls as random blocks. In a community of 
50,000,000 containing 25 such blocks of 2,000,000 (and Penrose puts as a rough and ready 
figure about 15 for U.K.), one disciplined group of 2,000,000 could control about 85 per 
cent. of all decisions. 

Professor Penrose would be first to admit that such considerations are schematic, but 
they are not on that account profitless. They bring a new searchlight to bear on the limiting 
size of an organization (political, industrial, or professional) consistent with effective two-way 
traffic of ideas, and may contribute to a sane revaluation of alleged benefits from organization 
on a large scale. Professor Penrose commendably draws attention to an aspect of the large- 
scale community which is of special constructive interest if civilized humanity is to survive 
the challenge of the atomic era. For the essay concludes with a table of representation of 
nationals in a world assembly calculated on the basis of the square-root law to ensure 
minimal dangers of undue influence by small pressure groups. LANCELOT HOGBEN. 





